Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to create a reliable cross platform listener for pipe0 that waits until specified socket is released by other process #1567

Open
gehreleth opened this issue Jan 13, 2022 · 9 comments

Comments

@gehreleth
Copy link

In this code nng_listener_start returns 10 in case when given address is already taken by concurrent process despite NNG_FLAG_NONBLOCK flag is specified. Is it possible to make listener wait until given address is available without creating
auxiliary threads? As far as I understand, internal aio pools are unavailable for client code, so I can't offload periodic listen attempt to one of nng threads.

  rv = nng_listener_create(&connection_nng->connection_watchdog.listener,
    connection_nng->transport.socket, uri);
  if (rv != 0) {
    TWEAK_LOG_ERROR("nng_listener_create returned %d", rv);
    finalize_transport(&connection_nng->transport);
    return TWEAK_WIRE_ERROR;
  }

  rv = nng_listener_start(connection_nng->connection_watchdog.listener,
    NNG_FLAG_NONBLOCK);
  if (rv != 0) {
    TWEAK_LOG_ERROR("nng_listener_start returned %d", rv);
    nng_listener_close(connection_nng->connection_watchdog.listener);
    finalize_transport(&connection_nng->transport);
    return TWEAK_WIRE_ERROR;
  }
@sergei-zykov
Copy link

sergei-zykov commented Feb 8, 2022

Hello.

Let me explain this issue:

The main question on more abstract level is how robust is this library given that

  1. Interface could be unavailable at the moment of nng_listener_start, but is expected to become available later on. Suppose, TCP address:port is occupied by a process that is about to be terminated, but hasn't released the port yet.
  2. Interface could become unavailable later on. Say, there is nng listener bound to TCP address of USB Ethernet adapter. Then user physically removed the adapter. Then returned that adapter back.

Would NNG library try to listen on that interface again, or some external watchdog thread is needed to achieve that? In my opinion, NNG connection is expected to be robust and external watchdog defeats the purpose of this library.

@gdamore
Copy link
Contributor

gdamore commented Feb 9, 2022

At present, this behavior is not the available in NNG, although it would be possible to have a thread that sits in a loop and calls nng_listener_start() repeatedly until succeeds (presumably sleeping a second or two between retries.)

Nanomsg does have this ability, but we found it to be problematic and confusing for most users, so we didn't implement it in NNG.

I would possibly be amenable to implementing this capability. If you need it urgently, let me know out of band and I can try to allocate some time at a higher priority.

@sergei-zykov
Copy link

This is not urgent, our team is in the process of discussion of possible ways of dealing with this problem. As of now, some of our automatic tests sporadically fail with probability of ~3-5% because one of instances of test binary haven't released pipe0 in time.

@sergei-zykov
Copy link

Hello.

Yes, I'd like to confirm that this feature is indeed needed for our project.
Creation of auxiliary thread trying to repeat nng_listen unless it return success error status is considered writing flaky code full of workarounds.

Sorry for delay.

@gdamore
Copy link
Contributor

gdamore commented Mar 28, 2022

Ok, I'll see if I can put something together...
although honestly my code is probably going to be doing exactly the same thing, although using asynchronous callbacks instead of a dedicated thread. :-)

@alzix
Copy link
Contributor

alzix commented Nov 1, 2022

did some tests on Windows 10 x64: once windows gets in to a problematic state (NNG_EADDRINUSE - Address in use), in some cases no further retries helps even after a long sleep.
tried (2sec delay) x (3 retries) - no lack.
also tried to close the client's socket and dialer between listen retires to simulate a well behaving client
as per MS documentation referenced from #1175

you can find my test here: https://gist.github.com/alzix/2f9fb93b026835eb9ba57d780966b658

my tests show that closing the process and restarting does help - so when running the test executable in a loop next test executable invocation typically passes:
e.g.

--------------------------------------------------------------------------------
execute loop 27
Running main() from _deps\googletest-src\googletest\src\gtest_main.cc
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from NNG
[ RUN      ] NNG.test_recon
nng_listen failed res=10(Address in use)
ERR! iter=5985 retry_count=0 10(Address in use)
nng_listen failed res=10(Address in use)
ERR! iter=5985 retry_count=1 10(Address in use)
nng_listen failed res=10(Address in use)
ERR! iter=5985 retry_count=2 10(Address in use)
..\tester\unit_tests\uc_ipc_tests.cpp(231): error: Expected equality of these values:
  status
    Which is: 10
  0
[  FAILED  ] NNG.test_recon (6797 ms)
[----------] 1 test from NNG (6798 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (6800 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] NNG.test_recon

 1 FAILED TEST
--------------------------------------------------------------------------------
execute loop 28
Running main() from _deps\googletest-src\googletest\src\gtest_main.cc
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from NNG
[ RUN      ] NNG.test_recon
[       OK ] NNG.test_recon (1196 ms)
[----------] 1 test from NNG (1197 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (1200 ms total)
[  PASSED  ] 1 test.

@gdamore
Copy link
Contributor

gdamore commented Sep 14, 2023

If your application has exhausted the socket address space, or a previous application is holding the address, or the connection died abnormally, it can cause the socket address to be held for the duration of the socket time, Which can be a while.

@alzix
Copy link
Contributor

alzix commented Nov 2, 2023

the test was performed on windows with named-pipes - once the pipe is closed in some cases it cannot be reopened.
when this happens restarting the process usually helps. delay and retries within the same process do not help.

@gdamore
Copy link
Contributor

gdamore commented Nov 26, 2023

I really want to throw Windows Named Pipes into the dumpster. But probably not right away. Maybe for a 2.0 release because it will be breaking.

@gdamore gdamore changed the title Is it possible to create a relable cross platform listener for pipe0 that waits until specified socket is released by other process Is it possible to create a reliable cross platform listener for pipe0 that waits until specified socket is released by other process Jan 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants