Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite of NCM device driver #2227

Merged
merged 20 commits into from May 7, 2024
Merged

Rewrite of NCM device driver #2227

merged 20 commits into from May 7, 2024

Conversation

rgrr
Copy link
Contributor

@rgrr rgrr commented Aug 21, 2023

Describe the PR
This is a rewrite of the original NCM device driver. You can check tests concerning functionality and performance here.

Additional context
Buggyness of the current driver can be easily verified with for MSS in 100 200 400 800 1200 1450 1500; do iperf -c 192.168.14.1 -e -i 1 -M $MSS -l 8192 -P 1; sleep 2; done.

I'm using the driver in my own project. The USB device is running together with Linux/Win10 without any problems.

This solves #2068.

#endif

#define TU_LOG_DRV(...) TU_LOG(CFG_TUD_NCM_LOG_LEVEL, __VA_ARGS__)
#if 0
#define INFO_OUT(...) printf(__VA_ARGS__)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we us the debug macros defined in tusb_debug.h ?

src/class/net/ncm.h Show resolved Hide resolved
src/class/net/ncm.h Show resolved Hide resolved


static void notification_xmit(uint8_t rhport, bool force_next)
/**

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make comment format consistent with tinyUSB ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that block comments looks much cleaner than inline comment format used in TinyUSB 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure about this: what are the TinyUSB comment conventions?

.wNtbOutMaxDatagrams = 6 // 0=no limit
};

// Some confusing remarks about wNtbOutMaxDatagrams...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove redundant Debug comments

@HiFiPhile
Copy link
Collaborator

HiFiPhile commented Apr 28, 2024

Hi, sorry for huge delay. I'm not an expert of NCM class but I'd like to move the things on.

Compared to RNDIS current NCM implementation is problematic, and new implementation doesn't perform well at high MSS on Linux, with STM32F723:

MSS vs Mbit/s RNDIS NCM with #2564 NCM New
100 7.25 7.04 10.2
200 14.5 6.95 12.5
400 21.4 6.75 6.71
800 34.8 6.97 8.02
1450 34.8 8.17 8.00

On the bus I observed high delay between OUT and IN transfer, also between 2 OUT transfers for both NCM and NCM New.
It could also be a linux tweaking issue, on Windows 10 I got a throughput of ~30Mbps without these high delay.
image

I'll try to review your implementation, below is the USB capture in case if you're interested.
ncm_new.zip

@rgrr
Copy link
Contributor Author

rgrr commented Apr 29, 2024

On the bus I observed high delay between OUT and IN transfer, also between 2 OUT transfers for both NCM and NCM New. It could also be a linux tweaking issue, on Windows 10 I got a throughput of ~30Mbps without these high delay

Hello HiFiPhile

very interesting observation. Could you do the little benchmark comparison also with Win10.

@HiFiPhile
Copy link
Collaborator

HiFiPhile commented Apr 29, 2024

very interesting observation. Could you do the little benchmark comparison also with Win10.

It seems like iperf on Windows doesn't support MSS adjustment, I see no difference in captured packet.

On Windows there is only one TCP ACK packet each time instead of 2, also delay between transfer is reduced. I got a throughput of 24Mbps.
image

Here is the USB capture :
ncm_enum win.zip


One thing strange is on USB bus ACK seq number is shifted from data packet, or maybe it's a decoding issue.

image
image

Here is the Netif capture on Linux:
netif.zip

@rgrr
Copy link
Contributor Author

rgrr commented May 1, 2024

Hmmm... don't know what to do here. Unfortunately I don't have any Hi-Speed device in my drawer, so incapable of checking the above.

At least I have a project using NCM which also works in conjunction with Win10. Speed is comparable. My major concern was compatibility.

@HiFiPhile
Copy link
Collaborator

HiFiPhile commented May 1, 2024

I was able to massively boost the performance simply by increasing TCP_WND !

MSS vs Mbps default x2 x4
100 10.2 10.3 10.4
200 12.5 18.9 19.2
400 6.71 24.9 32
800 8.02 12.5 46.6
1450 8.00 14.8 70.4

MSS=1450 comparaison

  • TCP_WND = TCP_MSS

Instead of one complete datagram 2 segments are sent, both are acked, then high delay (~1ms) until next transfer.
image

  • TCP_WND = 2*TCP_MSS

Two complete datagrams are sent, both are acked, then high delay (~1ms) until next transfer.
image

  • TCP_WND = 4*TCP_MSS

Two complete datagrams are sent, only 1 ack, ack delay is a little bit higher maybe for processing, no delay until next transfer.
image


It makes me to believe there must be some relationships between TCP window and NTB size to make it works better, so I decreased NTB size to 1600 while keeping TCP_WND = 2*TCP_MSS:

MSS vs Mbps default x2 x4 x2; NTB 1600
100 10.2 10.3 10.4 10.7
200 12.5 18.9 19.2 20.6
400 6.71 24.9 32 38.4
800 8.02 12.5 46.6 44.8
1450 8.00 14.8 70.4 68

2 Packets are sent and 2 ack, no delay until next transfer.
image

Below are all USB captures:
tcp_wnd_test.zip


Bonus: With #2576 applied, I got 116Mbps @ MSS=1450, TCP_WND = 4*TCP_MSS !

@rgrr
Copy link
Contributor Author

rgrr commented May 1, 2024

Wow... those are great numbers. Out of curiosity: did you test different TCP_WND with RNDIS as well?

And the other point: anything I can do on the driver side to enhance throughput?

@HiFiPhile
Copy link
Collaborator

HiFiPhile commented May 1, 2024

Out of curiosity: did you test different TCP_WND with RNDIS as well?

Here is RNDIS on Arch Linux:

MSS vs Mbps default x2 x4
100 7.48 7.54 7.74
200 14.7 14.1 14.7
400 25.6 27.8 27.2
800 38.8 41.8 50.9
1450 38.8 65.4 71.9

At high MSS throughput is little higher than my previous test... so TCP_WND helps RNDIS as well. On Windows 10 there isn't much difference, always at ~11Mbps @ 1450.

And the other point: anything I can do on the driver side to enhance throughput?

With your understanding is my finding about TCP_WND vs NTB size coherent ? How did you chose the size of 3200 ?

It looks like TCP_WND=2*TCP_MSS and NTB_SIZE=1600 has a good performance and balanced RAM usage, does it works well in your case ?

@rgrr
Copy link
Contributor Author

rgrr commented May 1, 2024

Meaning, that RNDIS benefits only for MSS=800/1450 (and is still faster than NCM for this cases :-/).

Buffer size: I have taken the 3200 from the original TinyUSB NCM driver. So no thoughts from my side concerning buffer size.

@rgrr
Copy link
Contributor Author

rgrr commented May 1, 2024

Hmmm... currently checking Zephyr sources. They do not transmit frames from device -> host which are > NET_ETH_MAX_FRAME_SIZE, which is 1500+header_size

@HiFiPhile
Copy link
Collaborator

Not sure if lwip's iperf server is buggy or something else, reverse throughput doesn't work well, most cases it just doesn't transfer.
Also looks like a it always use TCP_MSS as MSS.

MSS vs Mbps RNDIS NCM
100 0 0
200 0 0
400 0 0
800 23 16
1200 23 16
1450 0 0

NTB is not used efficiently, always cut in half.
image

@rgrr
Copy link
Contributor Author

rgrr commented May 2, 2024

Yes, I had the feeling as well and as far as I remember I found a bug in iperf sources but was too busy/lazy to put the fix into their merge machinery.

You can give the attached file a try.

lwiperf.txt

Another point concerning buffer/frame sizes: I'm currently checking Zephyr sources. They do not transmit frames from device -> host which are > NET_ETH_MAX_FRAME_SIZE, which is 1500+header_size

@rgrr
Copy link
Contributor Author

rgrr commented May 2, 2024

Ah... yes... one more thing: in function_ncm.c there is a definition of up/downlink speed. Never really tried values for tuning but perhaps the Linux NCM driver is smart and checks those values?

@rgrr
Copy link
Contributor Author

rgrr commented May 4, 2024

Interesting... when I implemented the driver, I did a lot of performance testing. And honestly: a lot was also guess work. One of the assumptions was, that having at least two buffers for each RX/TX would be of benefit, because while one frame is sent, the other can be filled (TX) or while a received frame is being handled it should be a good idea to have one concurrently receiving new data.

And I'm almost sure, that on my relatively slow RP2040 that was the case: I'm getting there around 5MBit/s in both directions, so the link is at 50% usage.

For your case with hi-speed, it probably doesn't matter because transfer is not the bottleneck.

Nevertheless, I have the feeling we are hunting some performance issues (which are optimizations and perhaps further to investigate).

Any real reason to not merge the driver?

@HiFiPhile
Copy link
Collaborator

HiFiPhile commented May 4, 2024

And I'm almost sure, that on my relatively slow RP2040 that was the case: I'm getting there around 5MBit/s in both directions, so the link is at 50% usage.

Could you make test if current settings works well in your case ?

Edit: I saw some improvement increasing NTBs with Full-Speed on STM32F407.

Nevertheless, I have the feeling we are hunting some performance issues (which are optimizations and perhaps further to investigate).
Any real reason to not merge the driver?

Well main reason is lacking of ressource, even my own PR's are missing from review.

I'd like to ensure if everything works smoothly, since we often met the case where original authors are hard to be contacted when something happens, there are pending issues hard to fix.

We are close, once we choose a parameter set I'll do a formating pass then it's ready.

@rgrr
Copy link
Contributor Author

rgrr commented May 4, 2024

I understand the situation & I'm actually one of those :-/

My last (side) project was based on RP2040 and its SDK which uses TinyUSB et al.

My current (professional) project is based on Zephyr, so chances are that I will drift away from TinyUSB.

Concerning the settings: of course I will test those. Give me some minutes.

@rgrr
Copy link
Contributor Author

rgrr commented May 4, 2024

Gosh... the number of screws to turn tends to drive me crazy: on my device, if setting CFG_TUD_NCM_*_NTB_MAX_SIZE=2048 then iperf produces retries. This can be circumvented by setting wNtbOutMaxDatagrams=1, but this is actually not the intention. So the 3200 are not randomly selected. But don't ask me, why the 2048 length produces errors.

So here are my results (number=buffer-size, n=x means number of buffers):

MSS vs Mbps org 2048/n=1 3200/n=1
100 2.6 1.7 1.7
200 3.5 2.6 2.6
400 4.1 0.5 3.4
800 4.5 0.6 4.1
1450 4.7 4.5 4.5

So the current configuration isn't there for no reason, at least with my configuration. I've done also intensive tests with SystemView and outcome was the configuration you can see in the repo.

@rgrr
Copy link
Contributor Author

rgrr commented May 4, 2024

Perhaps a parameter for wNtbOutMaxDatagrams should be added to allow the regular user as well to enter parameter hell!?

@rgrr
Copy link
Contributor Author

rgrr commented May 5, 2024

fighting with the parameters and trying to understand... that's the current outcome:

#ifndef CFG_TUD_NCM_IN_NTB_MAX_SIZE
    /// must be >> MTU
    #define CFG_TUD_NCM_IN_NTB_MAX_SIZE        (2 * TCP_MSS + 100)     // can be set to 2048 without impact
#endif
#ifndef CFG_TUD_NCM_OUT_NTB_MAX_SIZE
    /// must be >> MTU
    #define CFG_TUD_NCM_OUT_NTB_MAX_SIZE       (2 * TCP_MSS + 100)     // can be set to smaller values if wNtbOutMaxDatagrams==1
#endif

Meaning: CFG_TUD_NCM_OUT_NTB_MAX_SIZE is dependent on TCP_MSS while CFG_TUD_NCM_IN_NTB_MAX_SIZE is not. CFG_TUD_NET_MTU=1514 in my case.

For me it is not clear, why CFG_TUD_NCM_OUT_NTB_MAX_SIZE must be twice TCP_MSS (plus some overhead). Is it a host driver issue or what?

Too many TCP parameters...

@HiFiPhile
Copy link
Collaborator

Perhaps a parameter for wNtbOutMaxDatagrams should be added to allow the regular user as well to enter parameter hell!?

Yes why not.

So the current configuration isn't there for no reason, at least with my configuration. I've done also intensive tests with SystemView and outcome was the configuration you can see in the repo.

If you agree we can set NTB default at (2 * TCP_MSS + 100) and N=1 with a comment mentioning increasing N may increase performance in trade of RAM usage in some cases.

But don't ask me, why the 2048 length produces errors.

Me too I saw this behavior on FS, maybe a driver thing.

@rgrr
Copy link
Contributor Author

rgrr commented May 5, 2024

I have added your suggestions. Currently the NCM driver is not built in the CI. Any idea how to add that?

@HiFiPhile
Copy link
Collaborator

I'll do comment and format things :) Also I have to move #include "lwipopts.h" to example's tusb_config.h, the stack itself should have minimal external dependencies.

@rgrr
Copy link
Contributor Author

rgrr commented May 5, 2024

I'll do comment and format things :) Also I have to move #include "lwipopts.h" to example's tusb_config.h, the stack itself should have minimal external dependencies.

Thank you very much for testing and reviewing.

Copy link
Contributor Author

@rgrr rgrr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, changed according your suggestion.

src/class/net/ncm_device.c Dismissed Show dismissed Hide dismissed
src/class/net/ncm.h Dismissed Show dismissed Hide dismissed
src/class/net/ncm_device.c Dismissed Show dismissed Hide dismissed
src/class/net/ncm.h Dismissed Show dismissed Hide dismissed
@HiFiPhile
Copy link
Collaborator

Thank you for your great effort for this new driver ! Now it's setup.

tusb_control_request_t header;
uint32_t downlink, uplink;
// Alignment must be 4
#define TUD_NCM_ALIGNMENT 4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this a constant, it's used to be configurable but there was also an error check to ensure TUD_NCM_ALIGNMENT=4, which is quite awkward.


// Number of NCM transfer blocks for reception side
#ifndef CFG_TUD_NCM_OUT_NTB_N
#define CFG_TUD_NCM_OUT_NTB_N 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example build I reduced both NTBs to 1 for less RAM usage, users can read ncm.h for tuning details.

#endif

//------------- CLASS -------------//

// Network class has 2 drivers: ECM/RNDIS and NCM.
// Only one of the drivers can be enabled
#define CFG_TUD_ECM_RNDIS 1
#define CFG_TUD_NCM (1-CFG_TUD_ECM_RNDIS)
#define CFG_TUD_ECM_RNDIS USE_ECM
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2242 is included

@rgrr
Copy link
Contributor Author

rgrr commented May 6, 2024

Thank you for your great effort for this new driver ! Now it's setup.

Wonderful to hear that my contribution will be part of tinyusb.
And many thanks for your review. It was very constructive and helpful for rechecking some parts.

@HiFiPhile HiFiPhile merged commit e5b6f93 into hathach:master May 7, 2024
68 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants