Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New defaults in libmbedtls >= 3.3.0 break NNG TLS functions #1661

Closed
shikokuchuo opened this issue May 12, 2023 · 13 comments
Closed

New defaults in libmbedtls >= 3.3.0 break NNG TLS functions #1661

shikokuchuo opened this issue May 12, 2023 · 13 comments

Comments

@shikokuchuo
Copy link
Contributor

shikokuchuo commented May 12, 2023

Describe the bug

As I had reported, mbedtls changed its default behaviour in 3.3.0, enabling MBEDTLS_SSL_DTLS_CONNECTION_ID.
Full description here: https://github.com/Mbed-TLS/mbedtls/releases/tag/v3.3.0

The changes require the following lines in the source file ‘include/mbedtls/mbedtls_config.h’ to be commented out:

#define MBEDTLS_SSL_DTLS_CONNECTION_ID
#define MBEDTLS_SSL_DTLS_CONNECTION_ID_COMPAT 0

Re-posting here as this was also observed in #1660 and shouldn't be lost.

It might be better to make NNG compatible, to play well with system packaged libmbedtls libraries.

Expected behavior
No different to mbedtls < 3.3.0

Actual Behavior
If defaults are not changed, will segfault when used in binding calling https or wss function; fatal alert message in the case of #1660

** Environment Details **

  • NNG 1.6.0a (c5e9d8a)
  • Ubuntu 22.04
  • gcc 11.3
  • static lib
@jritts
Copy link

jritts commented May 20, 2023

Hello, I was the one who created issue #1660 - thanks again for that work around.

I've hit another issue which may be related: calls to nng_dial_start() for a "tls+tcp://" REQ socket are returning NNG_ECONNSHUT. This is on Windows.

Strangely, nng_dial_start() only returns that error most of the time; once in a while, the tls+tcp connection succeeds.

If I switch both nodes to "tcp://" (and bypass the TLS config calls), dialing always succeeds.

With the identical REP code, but simply switching the listener to the wss:// schema, browser websocket connections always succeed.

No such issues with tls+tcp PAIR1 connections.

From past socket programming experience, it smells like an address reuse issue, but I'm curious if you think this is related to the same Mbed-TLS changes and recommend I back mbed out to v3.2 or earlier. Or do I need to explicitly configure listeners to allow address reuse?

Perhaps I should make a separate issue for this.

@shikokuchuo
Copy link
Contributor Author

@jritts if it's not too much trouble, can you simply test the behaviour using either mbedtls 3.2 or the 2.28 branch? This is the LTS branch so is current and maintained. From your description I suspect this is a new issue, and testing it will confirm.

Also it would make the issue much clearer for @gdamore if you can confirm exactly what works and what doesn't in terms of the combinations of req / rep, dial / listen, tls+tcp / wss.

Thanks again for reporting this. Currently I've been focusing on other areas, but I do expect to come back to TLS in a month or two so it's great that you're paving the way.

One last thing, if you're using the NNG tip, try the 8e1836f build, because of #1662.

@jritts
Copy link

jritts commented May 21, 2023

Will do so this week and report back - thanks again for your help.

@jritts
Copy link

jritts commented May 22, 2023

I'm seeing the same issue with the versions you mentioned - NNG at 8e1836f, mbed at 981743d. nng_version() returns "1.6.0pre", mbedtls_version_get_string "2.28.3".

So I'm probably doing something wrong then. I'll build NNG's rep-req example against the same libraries and work forward from there.

Thanks again.

@jritts
Copy link

jritts commented May 22, 2023

It was my fault: I had misunderstood what was meant by "synchronous" in the documentation and was opening multiple parallel REQ sockets from a single client to the same REP. I had assumed REQ sockets were just expected to call send and recv in 1-to-1 serial pairs, but that I could have multiple threads acting as separate clients.

So I'm good to go - thanks again for the help and the info about the LTS versions. Please let me know if there's any other testing I could do to contribute.

@jritts
Copy link

jritts commented May 22, 2023

My WSS client is single threaded, so no issue there.

As for why it appeared to work for parallel req clients when TLS was disabled, maybe the TLS handshake was adding to connection time and the risk of a collision. I’ll confirm that.

@jritts
Copy link

jritts commented May 22, 2023

maybe the TLS handshake was adding to connection time and the risk of a collision. I’ll confirm that.

Nope, I added sleeps to ensure multiple threads are concurrently between socket create and close, and I cannot get nng_dial_start to fail with plain tcp. Only tls+tcp appears to disallow parallel REQs.

@shikokuchuo
Copy link
Contributor Author

It was my fault: I had misunderstood what was meant by "synchronous" in the documentation and was opening multiple parallel REQ sockets from a single client to the same REP. I had assumed REQ sockets were just expected to call send and recv in 1-to-1 serial pairs, but that I could have multiple threads acting as separate clients.

I am not sure I follow - multiple req / multiple rep are both permitted. Have you looked into the use of contexts to keep the requests separate?

So I'm good to go - thanks again for the help and the info about the LTS versions. Please let me know if there's any other testing I could do to contribute.

Glad it worked for you in the end. If your project is open-source, feel free to share a link. Otherwise, the best of luck!

@jritts
Copy link

jritts commented May 22, 2023

multiple req / multiple rep are both permitted
Have you looked into the use of contexts

In my case, each request was associated with its own nng_socket. I'll put together a self contained example to reproduce it.

@jritts
Copy link

jritts commented May 23, 2023

Here's a self-contained test for vs2022, with the output of one run on my machine. I highlighted the case where dialing a tls+tcp listener from multiple REQ sockets on different threads failed. Most of the time, all 4 dials fail, but sometimes one works.

The prebuilt libs are the version you mentioned - NNG 8e1836f, mbed 981743d.

While running this test repeatedly, I also saw a crash in an NNG thread, but it's difficult to reproduce. Will try a bit more.

multiple_tls_dialer_fail
NNGReqRepTest.zip

@jritts
Copy link

jritts commented May 23, 2023

I looped the test, and after 1-2 minutes I saw this hang:

hang

@shikokuchuo
Copy link
Contributor Author

@gdamore I got round to attempting a patch for this issue and found that I could no longer re-produce. I think it may have just been that I updated a system install of mbedtls without re-building NNG against it - or else some kind of cmake failure to detect the correct version. In any case, just confirming that building NNG against a clean mbedtls v3.4 does work for me now.

@jritts I'm afraid I am not running Windows so can't confirm. I'm closing this issue given my finding above. You should re-open any outstanding issues you have.

@jritts
Copy link

jritts commented Jun 27, 2023

I assume there's no reason the believe the crash on Windows reproduced with the test project I attached is fixed, so I created a separate issue for it here.

#1668

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants