RFC: interruptible POSIX locks #12

pwmarcz · 2021-07-07T13:19:10Z

I'm trying to fix recently added POSIX locks (gramineproject/graphene#2481) so that you can interrupt them, but I ran into a problem trying to do it with current IPC. I don't want to jump and change too much before I learn your opinion.

@boryspoplawski @mkow @dimakuv

The problem

Graphene's current implementation of POSIX locks (fcntl) do not support interrupting the request:

void handler(int signum) {
    printf("got signal %d\n", signum);
}
signal(SIGALRM, &handler);

fcntl(fd, F_SETLK, ...);
pid_t pid = fork();
if (pid == 0) {
    // child process:

    // trigger SIGALRM after 10 seconds
    alarm(10);

    // this should be interrupted:
    fcntl(fd, F_SETLKW, ...);
    ...
}

The fcntl(F_SETLKW) in the child process should wait for 10 seconds and then fail with EINTR (after the process receives SIGALRM). Under Graphene, it hangs indefinitely.

So far, I've seen this checked by LTP (test fcntl16), and I'm pretty sure it's used by stress-ng (see gramineproject/graphene#2510). I'm not sure how much it's used in actual applications, but interrupting the fcntl call seems to be the only sensible way of trying to take a lock with timeout.

How to interrupt an operation over IPC?

Interrupting the lock operation is easy in the single-process version. However, if we are in a remote process, the operation is requested over IPC, and it's not that easy to achieve the same semantics.

The current IPC implementation does something like this:

lock_msg = { POSIX_LOCK_SET, .wait = true, ... };

// this waits for response, and retries on EINTR
ipc_send_message_and_get_response(lock_msg, &result);
return result;

As long as ipc_send_message_and_get_response keeps retrying, we have no way to return earlier. I've been thinking about some solutions:

Stop waiting?: You could be tempted to modify the ipc_send_message_and_get_response() function to not wait (perhaps with some retry = false parameter). However, this does not actually cancel the operation, so the lock will be taken eventually. This is wrong - we want the F_SETLKW operation to fail without side effects.

Stop waiting and issue another call? We could send another IPC message to cancel the pending operation. Keep in mind that the operation might get finished before the "cancel" message is delivered, so we need to learn what the actual result is.

lock_msg = { POSIX_LOCK_SET, .wait = true, ... };

// this returns -EINTR if the wait is interrupted
ret = ipc_send_message_and_get_response(lock_msg, /*retry=*/false, &result);

if (ret == -EINTR) {
    // Cancel, and get result. `result` will be -EINTR if we succesfully canceled the operation, 
    // or something else (0, -ENOMEM, ...) if the operation finished in the meantime
    cancel_msg = { POSIX_LOCK_CANCEL, msg.seq, ... };
    ipc_send_message_and_get_response(cancel_msg, /*retry=*/true, &result);
}
return result;

The problem with that is that if the operation does get finished in the meantime, the remote side has to keep storing the result in case we receive POSIX_LOCK_CANCEL. So we would need some way of cleaning up that result.

Also, if the response to POSIX_LOCK_SET is sent in the meantime, we are going to just drop it.

Keep waiting for the original response? Ideally, we want to always get the response for the original message, but sometimes we want to trigger the response to come early. So we could register a wait, but on EINTR, send additional message in the meantime, then return to the previous wait:

lock_msg = { POSIX_LOCK_SET, .wait = true, ... };

ipc_send_message_and_register_wait(lock_msg);
ret = wait_for_response(lock_msg, /*retry=*/false);
if (ret == -EINTR) {
     // Cancel. That will cause response to `lock_msg` to be sent without further delay
     // (if it hasn't been sent already).
     cancel_msg = { POSIX_LOCK_CANCEL, msg.seq, ... };
     ipc_send_message(cancel_msg);

     // Wait for response to the original message. `result` will be -EINTR if we canceled the operation,
     // or something else (0, -ENOMEM...) if the operation finished in the meantime.
     wait_for_response(lock_msg, /*retry=*/true, &result);
}
return result;

I like that, but it makes the IPC API more complicated: instead of one function that does everything, there are separate stages (first send a message and register a wait, then wait for response). That means managing some state between calls (a "waiter" registered for the IPC worker) that currently is local to ipc_send_message_and_get_response.

Add an on_interrupt callback? The above could be simplified by keeping the ipc_send_message_and_get_response as is, but making it perform some additional action when the wait is interrupted.

int send_cancel(struct shim_ipc_message* msg) {
     cancel_msg = { POSIX_LOCK_CANCEL, msg->seq, ... };
     return ipc_send_message(cancel_msg);        
}

lock_msg = { POSIX_LOCK_SET, .wait = true, ... };

// this calls `on_interrupt` when the wait is interrupted, then keeps on waiting
// (or fails, if `on_interrupt` returned failure)
ipc_send_message_and_get_response(lock_msg, /*on_interrupt=*/&send_cancel, &result);
return result;

This would certainly be easiest to use, but it's a pretty specific use-case. I think it's worth it only if we expect to use it again soon.

So I think either 3 or 4 are viable. But maybe I'm missing something and there is an easier way?

The text was updated successfully, but these errors were encountered:

dimakuv · 2021-07-08T07:24:01Z

I prefer option 4.

Option 3 sounds like an overkill to fix this issue (and similar "amend/cancel pending operation" issues). Splitting our nice ipc_send_message_and_get_response() into two phases, with some state maintained in-between, sounds complex.

By the way, what is the struct shim_ipc_message* msg in your send_cancel callback? Why is this argument needed? And what is msg->seq, how does it help?

pwmarcz · 2021-07-08T09:03:35Z

Option 3 sounds like an overkill to fix this issue (and similar "amend/cancel pending operation" issues). Splitting our nice ipc_send_message_and_get_response() into two phases, with some state maintained in-between, sounds complex.

I agree; on the other hand, option 4 sounds over-specific. But since it seems cheap, maybe it doesn't hurt to try it out. I'll try finishing my branch with option 4 and we'll see how it looks.

By the way, what is the struct shim_ipc_message* msg in your send_cancel callback? Why is this argument needed? And what is msg->seq, how does it help?

It was supposed to be the original message (lock_msg). I need seq which is a message identifier (currently added by ipc_send_message_and_get_response), so that the "cancel" message says which operation to cancel. Actually, I'd probably like to use a generic void* arg parameter in the callback instead, to make it a bit more flexible; then lock_msg will be passed explicitly.

mkow · 2021-07-11T13:43:49Z

How does 4. work with both messages? Will we wait for responses for both? (original lock + cancel msg)
Overall it seems like an ok solution for now.

boryspoplawski · 2021-07-11T23:51:32Z

seq wasn't mean to be a public interface, it was intended to stay inside IPC implementation (i.e. not meant to be used as some other objects ID, which I guess 4 proposes - use seq as lock request ID). Using it outside and sending two messages with same seq may break some core assumptions.
ipc_send_message_and_get_response is not interruptible by design. It's meant to be an internal mechanism and all public IPC interfaces are meant to be building blocks for other subsystems. Response messages are supposed to come ~soon, i.e. they are not supposed to do sleeps/waits in the message recipient (destination IPC worker).
Also doing sleeps with functions other than thread_wait is not signal-safe/fully-signal-aware.

I would say you cannot get away from using something like:

struct X x;
add_to_global_list(&x);
ipc_send_msg(get_lock);
thread_wait();
if (!check_if_condition_happend(&x, is_lock_taken, atomic_wrt_callbacks=True)) {
    ipc_send_msg(cancel_get_lock);
}
remove_from_global_list(&x);
return check_if_condition_happend(&x, is_lock_taken, atomic_wrt_callbacks=False);

and have some callbacks on lock_get response messages. This might sound a bit like reimplementing responses, but this seems to be a pretty special case.
Also this sounds like something that might need to be built-in sync server/client code?

pwmarcz · 2021-07-19T11:31:11Z

OK, looks like no easy win here. On the other hand, reimplementing responses doesn't seem so bad: an IPC abstraction for "long-term operations" might be exactly what I need. I'll think about it more.

boryspoplawski · 2021-07-19T11:32:52Z

@pwmarcz It's quite possible that we need to add some more advanced IPC primitives than we have right now.

dimakuv assigned pwmarcz Jul 22, 2021

mkow transferred this issue from gramineproject/graphene Sep 9, 2021

dimakuv added enhancement New feature or request P: 2 labels Nov 25, 2021

pwmarcz mentioned this issue May 13, 2022

Future ideas for filesystem and multi-process synchronization #584

Open

dimakuv mentioned this issue Mar 24, 2023

[LibOS] RFC: support for System-V semaphores #1248

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: interruptible POSIX locks #12

RFC: interruptible POSIX locks #12

pwmarcz commented Jul 7, 2021 •

edited by mkow

dimakuv commented Jul 8, 2021

pwmarcz commented Jul 8, 2021

mkow commented Jul 11, 2021

boryspoplawski commented Jul 11, 2021 •

edited

pwmarcz commented Jul 19, 2021

boryspoplawski commented Jul 19, 2021

RFC: interruptible POSIX locks #12

RFC: interruptible POSIX locks #12

Comments

pwmarcz commented Jul 7, 2021 • edited by mkow

The problem

How to interrupt an operation over IPC?

dimakuv commented Jul 8, 2021

pwmarcz commented Jul 8, 2021

mkow commented Jul 11, 2021

boryspoplawski commented Jul 11, 2021 • edited

pwmarcz commented Jul 19, 2021

boryspoplawski commented Jul 19, 2021

pwmarcz commented Jul 7, 2021 •

edited by mkow

boryspoplawski commented Jul 11, 2021 •

edited