Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reads with deadline #22

Open
michihenning opened this issue Apr 11, 2023 · 3 comments
Open

Reads with deadline #22

michihenning opened this issue Apr 11, 2023 · 3 comments

Comments

@michihenning
Copy link

I have a client written with NTF. The client connects to an existing server. Immediately after connecting, it sends a 64-byte message to the server, and the server immediately responds with a 64-byte reply. Client and server run on the same machine and talk over the backplane. (The server is pre-existing code, not written using NTF.)

The client socket is created like so:

ntca::StreamSocketOptions sock_opts;
sock_opts.setLingerFlag(false);
sock_opts.setLingerTimeout(0);
socket_ = interface_->createStreamSocket(sock_opts);

After establishing the connection and sending its 64-byte message, the client reads the server's reply like so:

ntca::ReceiveOptions r_opts;
r_opts.setMinSize(64);
r_opts.setMaxSize(64);

auto callback = socket_->createReceiveCallback([this](auto receiver, auto blob, auto event) {
    received_identity(receiver, blob, event);
}); 
auto error = socket_->receive(r_opts, callback);
if (error) {
    CS_LOG(FATAL) << "Cannot schedule receive: " << error.text();
    abort();
}

This works just fine. The callback pops without error, and the client gets the expected response from the server.

Now I want to add a timeout of 500 ms to the client. If the reply from the server does not arrive in that time, the client can take some error action. So, I change the receive like so:

ntca::ReceiveOptions r_opts;
r_opts.setMinSize(64);
r_opts.setMaxSize(64);
const int64_t millisecs = 500;
BloombergLP::bsls::TimeInterval timeout;
timeout.setTotalMilliseconds(millisecs);
r_opts.setDeadline(timeout);

auto callback = socket_->createReceiveCallback([this, millisecs](auto receiver, auto blob, auto event) {
    received_identity(receiver, blob, event, millisecs);
}); 
auto error = socket_->receive(r_opts, callback);
if (error) {
    CS_LOG(FATAL) << "Cannot schedule receive: " << error.text();
    abort();
}

The only change here is to add the timeout to the receive options. Now things do not work. The server gets the message as usual and responds to it, but the callback in the client pops immediately with e_WOULD_BLOCK. If I run the client many times, after about 30 or 50 attempts, it successfully get the response from the server. The whole thing behaves like the timeout is zero or near zero. Every now and then, the response from the server arrives quickly enough, but most of the time, the timeout prevents things from working.

I played around with the timeout value:

const int64_t millisecs = 500000000000; // 5 * 10^11

This still fails. But once I add another zero, things work on the first try every time:

const int64_t millisecs = 5000000000000; // 5 * 10^12

What gives?

@michihenning
Copy link
Author

This works:

BloombergLP::bsls::TimeInterval timeout(socket_->currentTime().addMilliseconds(millisecs));
r_opts.setDeadline(timeout);

What I found really confusing is that setDeadline() wants a time interval when, in fact, it wants a time point.
It would be good to have an example in the documentation to show how to do this correctly.

But it still begs the question: why expect a time interval since the epoch, rather than a point in time?

Also, wouldn't it be better to reject the operation at the time it is initiated when it has a deadline that is eons in the past? In other words, a deadline that is < current time is likely to be a bug.

Requiring an absolute time point here might not be the right thing. For example, my thread might get suspended in between getting the current time and initiating the async call,; by the time the thread is resumed, the deadline can have expired already. Would it make sense to add a way to set the deadline such that I can say "n milliseconds after the call is initiated"?

@mattrm456
Copy link
Contributor

mattrm456 commented May 17, 2023

It is an unfortunate reality that timer APIs are a mix of absolute and relative times, with some supporting one but not the other, and some supporting both. For example pthread_cond_timedwait uses absolute times, epoll_wait uses relative times, and timerfd_settime allows either. The situation is made even more complicated with operating systems that also support specifying the clock type for a timeout...and not all do, for all timed functions.

In designing semantics for timeouts, I favor absolute times for imperative, executed code, and relative times for something like declarative configuration files (absolute timeouts are non-sensical there). There are some exceptions; the connection retry interval is in relative time because it applies a timeout relative to the start of the connection attempt. Relative timeouts in high-level function signatures feel less useful to me because you lose control of any sort of precise understanding when the timeout might actually fire: the thread may be preempted or otherwise "wait" for an indeterminate time before the effective timeout becomes applied.

That being said, programmers may favor relative timeouts and there is no major reason for this library to only support absolute timeouts in its APIs. You raise two legitimate issues:

  1. Users want to specify relative times to timed operations
  2. Users may become confused by simply reading bsls::TimeInterval in a function signature and their bias leads them to assume it represents a relative timeout, without looking at the function declaration.

C++ has a nice tradition of using types to communicate semantics and using function overloads to separate intent. I would have vastly preferred to use some other type than bsls::TimeInterval in our vocabulary but this project has a requirement to use BDE and that library does not provide something like bsls::TimePoint.

I think we should tackle this issue by acknowledging another one: users should be able to use standard time types (from std::chrono) where its reasonable to add overloads for them to this library. So here's what I propose:

We add overloads for std::chrono::time_point and std::chrono::duration to the timed operations supported by this library. For example, adding ntca::ReceiveOptions::setDeadline(const std::chrono::time_point<std::system_clock>& deadline). Does this direction seem satisfactory to you?

@michihenning
Copy link
Author

I agree with you. I raised this issue mainly because of the cognitive dissonance. I'm sure I won't be the only one to assume that a duration is needed rather than an absolute point in time.
I would suggest to add some words to all the places where a deadline is used to make it clear that this really is not a duration.
I like the chrono overload suggestion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants