Reads with deadline #22

michihenning · 2023-04-11T05:17:35Z

I have a client written with NTF. The client connects to an existing server. Immediately after connecting, it sends a 64-byte message to the server, and the server immediately responds with a 64-byte reply. Client and server run on the same machine and talk over the backplane. (The server is pre-existing code, not written using NTF.)

The client socket is created like so:

ntca::StreamSocketOptions sock_opts;
sock_opts.setLingerFlag(false);
sock_opts.setLingerTimeout(0);
socket_ = interface_->createStreamSocket(sock_opts);

After establishing the connection and sending its 64-byte message, the client reads the server's reply like so:

ntca::ReceiveOptions r_opts;
r_opts.setMinSize(64);
r_opts.setMaxSize(64);

auto callback = socket_->createReceiveCallback([this](auto receiver, auto blob, auto event) {
    received_identity(receiver, blob, event);
}); 
auto error = socket_->receive(r_opts, callback);
if (error) {
    CS_LOG(FATAL) << "Cannot schedule receive: " << error.text();
    abort();
}

This works just fine. The callback pops without error, and the client gets the expected response from the server.

Now I want to add a timeout of 500 ms to the client. If the reply from the server does not arrive in that time, the client can take some error action. So, I change the receive like so:

ntca::ReceiveOptions r_opts;
r_opts.setMinSize(64);
r_opts.setMaxSize(64);
const int64_t millisecs = 500;
BloombergLP::bsls::TimeInterval timeout;
timeout.setTotalMilliseconds(millisecs);
r_opts.setDeadline(timeout);

auto callback = socket_->createReceiveCallback([this, millisecs](auto receiver, auto blob, auto event) {
    received_identity(receiver, blob, event, millisecs);
}); 
auto error = socket_->receive(r_opts, callback);
if (error) {
    CS_LOG(FATAL) << "Cannot schedule receive: " << error.text();
    abort();
}

The only change here is to add the timeout to the receive options. Now things do not work. The server gets the message as usual and responds to it, but the callback in the client pops immediately with e_WOULD_BLOCK. If I run the client many times, after about 30 or 50 attempts, it successfully get the response from the server. The whole thing behaves like the timeout is zero or near zero. Every now and then, the response from the server arrives quickly enough, but most of the time, the timeout prevents things from working.

I played around with the timeout value:

const int64_t millisecs = 500000000000; // 5 * 10^11

This still fails. But once I add another zero, things work on the first try every time:

const int64_t millisecs = 5000000000000; // 5 * 10^12

What gives?

The text was updated successfully, but these errors were encountered:

michihenning · 2023-04-11T22:51:20Z

This works:

BloombergLP::bsls::TimeInterval timeout(socket_->currentTime().addMilliseconds(millisecs));
r_opts.setDeadline(timeout);

What I found really confusing is that setDeadline() wants a time interval when, in fact, it wants a time point.
It would be good to have an example in the documentation to show how to do this correctly.

But it still begs the question: why expect a time interval since the epoch, rather than a point in time?

Also, wouldn't it be better to reject the operation at the time it is initiated when it has a deadline that is eons in the past? In other words, a deadline that is < current time is likely to be a bug.

Requiring an absolute time point here might not be the right thing. For example, my thread might get suspended in between getting the current time and initiating the async call,; by the time the thread is resumed, the deadline can have expired already. Would it make sense to add a way to set the deadline such that I can say "n milliseconds after the call is initiated"?

mattrm456 · 2023-05-17T11:03:06Z

It is an unfortunate reality that timer APIs are a mix of absolute and relative times, with some supporting one but not the other, and some supporting both. For example pthread_cond_timedwait uses absolute times, epoll_wait uses relative times, and timerfd_settime allows either. The situation is made even more complicated with operating systems that also support specifying the clock type for a timeout...and not all do, for all timed functions.

In designing semantics for timeouts, I favor absolute times for imperative, executed code, and relative times for something like declarative configuration files (absolute timeouts are non-sensical there). There are some exceptions; the connection retry interval is in relative time because it applies a timeout relative to the start of the connection attempt. Relative timeouts in high-level function signatures feel less useful to me because you lose control of any sort of precise understanding when the timeout might actually fire: the thread may be preempted or otherwise "wait" for an indeterminate time before the effective timeout becomes applied.

That being said, programmers may favor relative timeouts and there is no major reason for this library to only support absolute timeouts in its APIs. You raise two legitimate issues:

Users want to specify relative times to timed operations
Users may become confused by simply reading bsls::TimeInterval in a function signature and their bias leads them to assume it represents a relative timeout, without looking at the function declaration.

C++ has a nice tradition of using types to communicate semantics and using function overloads to separate intent. I would have vastly preferred to use some other type than bsls::TimeInterval in our vocabulary but this project has a requirement to use BDE and that library does not provide something like bsls::TimePoint.

I think we should tackle this issue by acknowledging another one: users should be able to use standard time types (from std::chrono) where its reasonable to add overloads for them to this library. So here's what I propose:

We add overloads for std::chrono::time_point and std::chrono::duration to the timed operations supported by this library. For example, adding ntca::ReceiveOptions::setDeadline(const std::chrono::time_point<std::system_clock>& deadline). Does this direction seem satisfactory to you?

michihenning · 2023-05-19T01:10:56Z

I agree with you. I raised this issue mainly because of the cognitive dissonance. I'm sure I won't be the only one to assume that a duration is needed rather than an absolute point in time.
I would suggest to add some words to all the places where a deadline is used to make it clear that this really is not a duration.
I like the chrono overload suggestion!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reads with deadline #22

Reads with deadline #22

michihenning commented Apr 11, 2023

michihenning commented Apr 11, 2023

mattrm456 commented May 17, 2023 •

edited

michihenning commented May 19, 2023

Reads with deadline #22

Reads with deadline #22

Comments

michihenning commented Apr 11, 2023

michihenning commented Apr 11, 2023

mattrm456 commented May 17, 2023 • edited

michihenning commented May 19, 2023

mattrm456 commented May 17, 2023 •

edited