Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check usbnet example with all ports #289

Open
hathach opened this issue Mar 3, 2020 · 31 comments
Open

check usbnet example with all ports #289

hathach opened this issue Mar 3, 2020 · 31 comments

Comments

@hathach
Copy link
Owner

hathach commented Mar 3, 2020

Describe the bug
RNDIS example PR #287 may not work with all ports, following is port that work

  • Microchip SAMD
  • Nordic nRF52840
  • Nuvoton NUC126
  • Nuvoton NUC505
  • STM32 FSDev
  • STM32 (Synopsys)
  • iMX RT1010 work on windows/Linux/macOS (Ubuntu 18.04, Ubuntu 14.04, Windows 7, Windows 10, macOS High Sierra 10.13.16) with gcc 9.2.1 (ARM 2019-q4) after increased stack size from 0x400 to 0x800

Ports that tested and not work

  • Microchip SAMG
  • NXP iMX RT: RT1020-RT1064 work with Windows VM, but doesn't seem to work with native ubuntu 20.04 (more tests needed)
  • NXP lpc 18 / 43 /

Ports that haven't tested yet

  • Nuvoton NUC120
  • NXP LPC3511
  • NXP LPC17/40
  • Sony CXD56
  • TI MSP430
  • Fomu
@pigrew
Copy link
Collaborator

pigrew commented Mar 4, 2020

It looks like the INITIALIZE_CMPLT request should return 52 bytes? Perhaps Windows doesn't care if the returned data is longer than expected.

The class driver control response code always returns 128 bytes:

CFG_TUSB_MEM_SECTION CFG_TUSB_MEM_ALIGN static uint8_t rndis_buf[128];

bool netd_control_request(uint8_t rhport, tusb_control_request_t const * request)
{
// Handle class request only
TU_VERIFY(request->bmRequestType_bit.type == TUSB_REQ_TYPE_CLASS);
TU_VERIFY (_netd_itf.itf_num == request->wIndex);
#if CFG_TUD_NET == OPT_NET_RNDIS
tud_control_xfer(rhport, request, rndis_buf, sizeof(rndis_buf));
#else
(void)rhport;
#endif
return true;
}

It trying to send exactly 128 bytes, usbd_control should append a zero-length-packet, but isn't. Could the SAMD and NUC automatically append a zero-length packet in hardware/driver, but the USB core code is broken in this regard?

For RNDIS, it looks like the message length can be extracted from byte 4 through 7 of the message itself.

(For the moment, I'm not going to do more debugging, so feel free to write patch(es) for this issue. It'd also be a good idea to do a bit more error checking, such as verifying bRequestCode in the control handlers.)

@majbthrd
Copy link
Collaborator

majbthrd commented Mar 4, 2020

No, not exactly 128 bytes. The intent was to transfer request->wLength, but no more than 128. tu_control_xfer() calls tu_min() to cap all transfers to request->wLength.

@pigrew
Copy link
Collaborator

pigrew commented Mar 4, 2020

@majbthrd : Based hathach's log, the SETUP packet has a wLength is 1025, so it wouldn't be truncated. My guess is that the host always sets wLength to 1025, so it'll always send 128 bytes (tu_min(1025,128)).

Regardless, we should figure out why the ZLP is not being sent.

@majbthrd
Copy link
Collaborator

majbthrd commented Mar 5, 2020

I've submitted a PR that should address the STM32 RNDIS issue, and I've regression tested it with SAMD21 and NUC126. It may be worth re-trying on the other targets mentioned above.

@hathach
Copy link
Owner Author

hathach commented Mar 5, 2020

@pigrew @majbthrd great analysis, I think the host expect the usbd to return ZLP packet to indicate the end of transfer. I think this is a great chance for us to fix this ZLP behavior of usbd. It is not easy to get an reproducible issue for us to work on. This will prevent similar issue down the road. @pigrew already did ZLP response previously for MSP430 port, maybe it is merged there, I will try to look it up.

@hathach
Copy link
Owner Author

hathach commented Mar 8, 2020

This commit c8247f0 fix ZLP issue for nrf52840. For STM32F072, it is little bit difference, the device did response with ZLP correctly, however STALL all request afterwards. (capture screenshot updated in 1st post)

@hathach
Copy link
Owner Author

hathach commented Mar 25, 2020

#304 and #305 fixed the issue for STM32 FSDev, will need to do more tests with other mcu :)

@pigrew
Copy link
Collaborator

pigrew commented Mar 25, 2020

STM32F407 still does not work (with the latest master).

After transferring the USB string 4, the following transaction happens:

SOF 976
SOF 977
SETUP ADDR 29 EP 0
DATA0 [ 21 00 00 00 00 00 18 00 ]
ACK
OUT ADDR 29 EP 0
DATA1 [ 02 00 00 00 18 00 00 00 02 00 00   00 01 00 00 00 00 00 00 00 00 40 00 00 ]
NAK
OUT ADDR 29 EP 0
DATA1 [ 02 00 00 00 18 00 00 00 02 00 00   00 01 00 00 00 00 00 00 00 00 40 00 00 ]
ACK
IN ADDR 29 EP 0
NAK
IN ADDR 29 EP 0
NAK

The transaction maintains NAK until Windows gives up.

@hathach
Copy link
Owner Author

hathach commented Mar 25, 2020

@pigrew yeah, there is a different issue with stm32f4, I will check it out later when doing more tests with the rest of mcus as well.

@hathach hathach changed the title check RNDIS with all ports check usbnet example with all ports Apr 16, 2020
@majbthrd
Copy link
Collaborator

majbthrd commented Apr 23, 2020

Just adding to what @pigrew already said...

The problem transfer happens with EP0. A host-to-device class request is handled, and this results in a tud_control_xfer() (./src/class/net/net_device.c:334).

A valid transaction should like this this:

rndisref

and here is what the STM32F4/dcd_synopsys driver appears to be doing:

stm32f4

ADDITIONAL:

The tud_control_xfer() on ./src/class/net/net_device.c:334 succeeds, resulting in a callback to netd_control_complete() on ./src/class/net/net_device.c:223.

That ultimately results in a call to netd_report on ./src/class/net/net_device.c:116, which calls usbd_edpt_xfer() to send a RNDIS notification via EP1 IN. If I comment out the usbd_edpt_xfer() call, the control transfer shown above will succeed and the board will function in Linux. However, Windows will not tolerate this missing notification message (part of of the RNDIS protocol).

@majbthrd
Copy link
Collaborator

OK, @duempel, whilst waiting for someone to volunteer to decipher and fix the dcd_synopsys driver, changing the notify EP from EP1 to EP3 causes the driver to work as expected, allowing net_lwip_webserver to run on the STM32F4(07)DISCOVERY.

To do this, change ./src/examples/device/net_lwip_webserver/src/usb_descriptors.c from:

  // Interface number, string index, EP notification address and size, EP data address (out, in) and size.
  TUD_RNDIS_DESCRIPTOR(ITF_NUM_CDC, STRID_INTERFACE, 0x81, 8, EPNUM_CDC, 0x80 | EPNUM_CDC, CFG_TUD_NET_ENDPOINT_SIZE),

to:

  // Interface number, string index, EP notification address and size, EP data address (out, in) and size.
  TUD_RNDIS_DESCRIPTOR(ITF_NUM_CDC, STRID_INTERFACE, 0x83, 8, EPNUM_CDC, 0x80 | EPNUM_CDC, CFG_TUD_NET_ENDPOINT_SIZE),

@hathach
Copy link
Owner Author

hathach commented Apr 24, 2020

It is useful to have the stack LOG as well, what is captured on the bus is not necessary what the DCD detect if we believe there is bug within DCD e.g the above setup is not detected and submit to the stack --> this will help to narrow down the issue.

OK, @duempel, whilst waiting for someone to volunteer to decipher and fix the dcd_synopsys driver, changing the notify EP from EP1 to EP3 causes the driver to work as expected, allowing net_lwip_webserver to run on the STM32F4(07)DISCOVERY.

To do this, change ./src/examples/device/net_lwip_webserver/src/usb_descriptors.c from:

  // Interface number, string index, EP notification address and size, EP data address (out, in) and size.
  TUD_RNDIS_DESCRIPTOR(ITF_NUM_CDC, STRID_INTERFACE, 0x81, 8, EPNUM_CDC, 0x80 | EPNUM_CDC, CFG_TUD_NET_ENDPOINT_SIZE),

to:

  // Interface number, string index, EP notification address and size, EP data address (out, in) and size.
  TUD_RNDIS_DESCRIPTOR(ITF_NUM_CDC, STRID_INTERFACE, 0x83, 8, EPNUM_CDC, 0x80 | EPNUM_CDC, CFG_TUD_NET_ENDPOINT_SIZE),

but HOW 😕 😕

@duempel
Copy link
Collaborator

duempel commented Apr 24, 2020

Thanks @majbthrd this is just a great start into new day. I tested your suggestion and it worked great using my STM32F105.

I need to do some hardware changes to get a logger running. Then I can have a deeper look inside for testing. It's confusing to me that working perfect on EP3 while working horribly bad on EP1.

@hathach
Copy link
Owner Author

hathach commented Apr 24, 2020

@duempel I would highly suggest you to use one of the stm32f4 discovery board when trouble shooting and analyzing issue. That is to make sure we are all testing the same hardware and anything you encounter is reproducible by everyone.

@majbthrd
Copy link
Collaborator

FYI: another standalone STM32F4(07)DISCO workaround that enables net_lwip_webserver is to use the dcd_int_handler as a poll routine instead of an interrupt handler:

In ./src/portable/st/synopsys/dcd_synopsis.c, comment out the interrupt enable:

void dcd_int_enable (uint8_t rhport)
{
  (void) rhport;
//  NVIC_EnableIRQ(OTG_FS_IRQn);
}

In ./src/device/usbd.c, add a call to dcd_int_handler():

void tud_task (void)
{
  dcd_int_handler(0);
  // Skip if stack is not initialized

et voila, everything works.

@majbthrd
Copy link
Collaborator

majbthrd commented Apr 25, 2020

That this modification causes things to work with the STM32F4(07)DISCO is bizarre.

By adding this to transmit_packet() in ./src/portable/st/synopsys/dcd_synopsis.c:

if (1==fifo_num)
{
  (* tx_fifo) = 1;
  (* tx_fifo) = 0;
  return;
}

to run prior to the normal copy to TX FIFO routine, it works under Linux. The same EP1 transaction also works in Windows, but things go off the rails later on with a EP2 transfer.

The exact same values are being written by the normal routine; they just happen a bit faster in the above.

Without the above modification, the EP1 TXFE sub-interrupt just fires over and over and over again.

@hathach
Copy link
Owner Author

hathach commented Apr 25, 2020

OK, @duempel, whilst waiting for someone to volunteer to decipher and fix the dcd_synopsys driver, changing the notify EP from EP1 to EP3 causes the driver to work as expected, allowing net_lwip_webserver to run on the STM32F4(07)DISCOVERY.

To do this, change ./src/examples/device/net_lwip_webserver/src/usb_descriptors.c from:

  // Interface number, string index, EP notification address and size, EP data address (out, in) and size.
  TUD_RNDIS_DESCRIPTOR(ITF_NUM_CDC, STRID_INTERFACE, 0x81, 8, EPNUM_CDC, 0x80 | EPNUM_CDC, CFG_TUD_NET_ENDPOINT_SIZE),

to:

  // Interface number, string index, EP notification address and size, EP data address (out, in) and size.
  TUD_RNDIS_DESCRIPTOR(ITF_NUM_CDC, STRID_INTERFACE, 0x83, 8, EPNUM_CDC, 0x80 | EPNUM_CDC, CFG_TUD_NET_ENDPOINT_SIZE),

just tested the changing EP1 to EP3 and it works as you said. It is probably a bitmask mis-calculation or something. Checking it now, hmmm.

@hathach
Copy link
Owner Author

hathach commented Apr 26, 2020

PR #380 should fix the issue with stm32f4, there is a detail explanation for the root cause there. Though it doesn't explain why changing from EP1 IN to EP3 IN could help with the issue though. Probably a hw race condition or something.

@majbthrd
Copy link
Collaborator

@hathach, well done! I've done a quick check with net_lwip_webserver operation using the STM32F4(07)DISCO with Linux, Windows, and macOS, and everything seems functional.

@majbthrd
Copy link
Collaborator

@hathach, I just noticed that NUC505 is listed at the top of this thread as having not been tested yet. I've been using it locally for a month or two, so I believe it can be put in the working list.

@hathach
Copy link
Owner Author

hathach commented Apr 27, 2020

@hathach, I just noticed that NUC505 is listed at the top of this thread as having not been tested yet. I've been using it locally for a month or two, so I believe it can be put in the working list.

ah thanks, I have updated the 1st post. NUC505 is too painful to flash, I already forgot how the dip switches work 😅 This lwip_webserver so far is really useful to spot issues on several dcds so far.

@duempel
Copy link
Collaborator

duempel commented Apr 27, 2020

@duempel I would highly suggest you to use one of the stm32f4 discovery board when trouble shooting and analyzing issue. That is to make sure we are all testing the same hardware and anything you encounter is reproducible by everyone.

Yes, I absolutely agree with you. It's just that I don't have access to any of the designated boards right now. Of course I will do my best not to arbitrarily publish any issues caused by a particular hardware configuration. Since we know that usbnet's problems were related to the Synopsys driver, it is important information to also see that the different workarounds lead to the same system behavior.

PR #380 should fix the issue with stm32f4, there is a detail explanation for the root cause there. Though it doesn't explain why changing from EP1 IN to EP3 IN could help with the issue though. Probably a hw race condition or something.

Thanks @hathach , you have done a great job on this PR. It's working fine here.

@majbthrd
Copy link
Collaborator

majbthrd commented Oct 7, 2020

With #531, the iMX RT1010 now works, and I suspect the same minor tweak to the stack size will remedy others.

@hathach
Copy link
Owner Author

hathach commented Oct 8, 2020

thanks @majbthrd I have updated the 1st post. I will pull out other boards to test with in the weekend.

@hathach
Copy link
Owner Author

hathach commented Oct 9, 2020

@majbthrd i just pull-out the rt1020-evk and rt1060-evk to test with

  • both 1020 & 1060 work (page loaded) with default stack size of 0x400 on Windows VM
  • both failed to load page with default/increased stack size on Linux

Would you mind testing your 1010 with Linux machine, since Linux use ECM and Windows use RNDIS, maybe there is a bit of issue somewhere. I am short on time to troubleshoot this for now, but if you could confirm the issue, I will update the 1st post that will make it easier to address this later on.

@majbthrd
Copy link
Collaborator

majbthrd commented Oct 9, 2020

@hathach , that's so strange, as I used Linux to test the RT1010 port. This weekend, I'll take a more in depth look.

@hathach
Copy link
Owner Author

hathach commented Oct 9, 2020

@hathach , that's so strange, as I used Linux to test the RT1010 port. This weekend, I'll take a more in depth look.

That is weird 🤔🤔

@majbthrd
Copy link
Collaborator

majbthrd commented Oct 10, 2020

@hathach, I think the stack size change that I suggested was unnecessary. I measured peak stack usage at 684 bytes, and that is below the original, default of 1024 bytes. (In turns out my initial problems with running TinyUSB on the IMXRT1010-EVK had a lot more to do with the flaky built-in DAP on the evaluation board.)

I tested the IMXRT1010-EVK using net_lwip_webserver with the original stack size of 0x400 with Windows, Linux, and MacOS. I saw no issue.

@hathach, your mention of ECM under Linux and Windows under VM makes me wonder if there were not other factors involved. As net_lwip_webserver is presently written, Linux uses RNDIS (not ECM) because RNDIS is the default configuration.

What makes net_lwip_webserver more demanding than the other device examples is that it has multiple configurations. The default (1st config) is RNDIS and the 2nd config is ECM.

As you know, TinyUSB does not support dynamically switching between multiple configurations. TinyUSB will only work if the host chooses a non-zero configuration once. Looking back on past discussions (i.e. #345 and #340), it was decided that adding dynamic switching would depend on future work on adding dcd_edpt_close() to all the drivers.

Sure enough, any attempt to dynamically switch between configurations (using usb_modeswitch in Linux) causes problems (USB traffic stops after a USB SET_CONFIG 0 then SET_CONFIG 2), but I confirmed this is true of any TinyUSB target, not just IMXRT1010.

If there is doubt as to whether ECM works in Linux, you could confirm this, if you wanted to, by temporarily swapping around the CONFIG_ID_RNDIS and CONFIG_ID_ECM values in ./examples/device/net_lwip_webserver/src/usb_descriptors.c This would cause ECM to be selected by default.

@hathach
Copy link
Owner Author

hathach commented Oct 11, 2020

@hathach, I think the stack size change that I suggested was unnecessary. I measured peak stack usage at 684 bytes, and that is below the original, default of 1024 bytes. (In turns out my initial problems with running TinyUSB on the IMXRT1010-EVK had a lot more to do with the flaky built-in DAP on the evaluation board.)

Ah right, daplink is bit slow partly since it is external qspi flash, I have updated the flash-pyocd target, it should be easier to flash.

I tested the IMXRT1010-EVK using net_lwip_webserver with the original stack size of 0x400 with Windows, Linux, and MacOS. I saw no issue.

Ooops, I will try to do more test, maybe it is some OS setting, btw which Linux distro you are testing with, I am on ubuntu 20.04

@hathach, your mention of ECM under Linux and Windows under VM makes me wonder if there were not other factors involved. As net_lwip_webserver is presently written, Linux uses RNDIS (not ECM) because RNDIS is the default configuration.

Ah right, I kind of forgot, which Linux pick, I mixed up, macOS is the one that pick ECM :)

If there is doubt as to whether ECM works in Linux, you could confirm this, if you wanted to, by temporarily swapping around the CONFIG_ID_RNDIS and CONFIG_ID_ECM values in ./examples/device/net_lwip_webserver/src/usb_descriptors.c This would cause ECM to be selected by default.

I have no doubt on the ECM driver, it works with other MCUs just fine, probably some issue with dcd driver of rt10xx though, since you test it successfully. I will look at other angle, btw I don't have the rt1010 evk, only has has other 1020-1064. So my test is probably not exact. Thank you very much for testing it again, I will carry more testing with my local machine more.

@majbthrd
Copy link
Collaborator

Ooops, I will try to do more test, maybe it is some OS setting, btw which Linux distro you are testing with, I am on ubuntu 20.04

I'm testing against: Ubuntu 18.04, Ubuntu 14.04, Windows 7, Windows 10, macOS High Sierra 10.13.16

I'm compiling with: gcc 9.2.1 (ARM 2019-q4)

Thanks for all your hard work, @hathach !

@hathach
Copy link
Owner Author

hathach commented Oct 12, 2020

Ooops, I will try to do more test, maybe it is some OS setting, btw which Linux distro you are testing with, I am on ubuntu 20.04

I'm testing against: Ubuntu 18.04, Ubuntu 14.04, Windows 7, Windows 10, macOS High Sierra 10.13.16

I'm compiling with: gcc 9.2.1 (ARM 2019-q4)

Thanks for all your hard work, @hathach !

thanks for your os info. I have updated the 1st post, I will test it out later on. Have to switch on other works. But I think it is probably something minor :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🆕 New
Development

No branches or pull requests

4 participants