CA-387861 Introduce fair locking subsystem #683

TimSmithCtx · 2024-04-18T13:00:59Z

None of the file-based locking systems available are "fair", as in, there is no guarantee that the first waiter will not be usurped when the lock is released by later waiters, thus suffering from lock starvation. This has been something of a problem across various attempts to wrap access to LVM, and indeed it turns out that multipath commands also need to be serialised on some of the same resources.

This adds a small templated systemd service which does nothing but accept one connection at a time on a UNIX Domain socket, with a fairly long accept queue. The service is started when needed (thus every process using this mechanism needs permission to start the service), and the "Lock" is acquired by connecting to the socket. connect() on UNIX Domain sockets does not time out, and waiting connections are held in a queue, providing fairness.

The python3 context manager is provided via a metaclass which checks the arguments and allows only one of each named Fairlock object to exist, so that attempts to take a lock when it is already held can raise an exception. This is not thread-safe but could probably be made so.

MarkSymsCtx · 2024-04-18T13:03:19Z

Probably should have an update to https://github.com/xapi-project/sm/blob/master/mk/sm.spec.in ?

MarkSymsCtx · 2024-04-18T13:04:27Z

Needs an update to .github/workflow/main.yml to repeat the python path changes

edwintorok · 2024-04-18T13:37:59Z

Were you mixing byte range and whole file locks (or attempt overlapping byte range locks) previously? That one is known to have fairness issues: https://rusty.ozlabs.org/2010/08/13/fcntl-lock-starvation-and-tdb.html
Although there is some evidence that flock is not fair https://code.jjb.cc/linux-flock-does-not-provide-fair-locking (pointing to potentially a bug in the kernel?

In python there might be a simpler way to get a process-wide fair lock though by using shared pthread mutexes (this is simple if you want to share lock with child processes and more complicated if you want to share with other processes). https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Lock

Perhaps a Manager object could be used for sharing Lock objects? https://docs.python.org/3/library/multiprocessing.html#using-a-remote-manager

None of the file-based locking systems available are "fair", as in, there is no guarantee that the first waiter will not be usurped when the lock is released by later waiters, thus suffering from lock starvation. This has been something of a problem across various attempts to wrap access to LVM, and indeed it turns out that multipath commands also need to be serialised on some of the same resources. This adds a small templated systemd service which does nothing but accept one connection at a time on a UNIX Domain socket, with a fairly long accept queue. The service is started when needed (thus every process using this mechanism needs permission to start the service), and the "Lock" is acquired by connecting to the socket. connect() on UNIX Domain sockets does not time out, and waiting connections are held in a queue, providing fairness. The python3 context manager is provided via a metaclass which checks the arguments and allows only one of each named Fairlock object to exist, so that attempts to take a lock when it is already held can raise an exception. This is not thread-safe but could probably be made so. Signed-off-by: Tim Smith <tim.smith@cloud.com>

TimSmithCtx · 2024-04-18T14:51:33Z

Probably should have an update to https://github.com/xapi-project/sm/blob/master/mk/sm.spec.in ?

Added. Also fixed a couple of rpmlint issues (Should not specify BuildRoot, or reference RPM_BUILD_ROOT or %buildroot in the %build section)

TimSmithCtx · 2024-04-18T14:56:20Z

Were you mixing byte range and whole file locks (or attempt overlapping byte range locks) previously? That one is known to have fairness issues: https://rusty.ozlabs.org/2010/08/13/fcntl-lock-starvation-and-tdb.html Although there is some evidence that flock is not fair https://code.jjb.cc/linux-flock-does-not-provide-fair-locking (pointing to potentially a bug in the kernel?

Neither fcntl() nor flock() is fair. We've run into this issue more than once.

I wouldn't call it a bug unless you can find somewhere any sort of fairness promise which is being violated. I know of no such promise. We might wish they were fair, but wishes are sadly not ponies...

In python there might be a simpler way to get a process-wide fair lock though by using shared pthread mutexes (this is simple if you want to share lock with child processes and more complicated if you want to share with other processes). https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Lock

In a single-threaded process, there is no need for locks at all. Since this code is single-threaded, the locking requirement is clearly between processes, so this won't help :-)

edwintorok · 2024-04-19T09:16:54Z

In python there might be a simpler way to get a process-wide fair lock though by using shared pthread mutexes (this is simple if you want to share lock with child processes and more complicated if you want to share with other processes). https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Lock

In a single-threaded process, there is no need for locks at all. Since this code is single-threaded, the locking requirement is clearly between processes, so this won't help :-)

If you look at the implementation it is a bit smarter than that, it uses named semaphores (in /dev/shm) https://github.com/python/cpython/blob/main/Modules/_multiprocessing/semaphore.c, so it does work between processes.

TimSmithCtx · 2024-04-22T08:55:33Z

If you look at the implementation it is a bit smarter than that, it uses named semaphores (in /dev/shm) https://github.com/python/cpython/blob/main/Modules/_multiprocessing/semaphore.c, so it does work between processes.

POSIX states that "the highest priority thread that has been waiting the longest shall be unblocked" only when either SCHED_FIFO or SCHED_RR applies to the blocked process. Apart from that, I see no promise of fairness in POSIX or SYSV semaphores.

And we've already been round the loop trying to manage a queue with locks. This is simpler and cleaner for our current purposes.

rdn32 · 2024-04-22T09:32:03Z

misc/fairlock/fairlock.py

+        self.sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
+        try:
+            self.sock.connect(self.sockname)
+        except (FileNotFoundError, ConnectionRefusedError):


There's a comment in fairlock.c saying that we'd also get this error if there were more than 64 waiters. I assume that trying to start the service when it is already running would be harmless, but we are only retrying the connection once. I take it that in reality we wouldn't expect to have that many waiters, so there's no point doing anything more involved. Yes?

Local experimentation suggests that python's connect() handles the case of >64 waiters on the client side quite comfortably. While connecting to a Unix domain socket when the listen queue is full is allowed to return ECONNREFUSED, in practice on linux it is EAGAIN, which python handles for us.

The check for ECONNREFUSED here is mostly to cover the one-time-only race condition where we are the second client in, and the service has been started by the first client in but has not quite got around to listening yet.

rdn32 · 2024-04-22T09:34:03Z

tests/test_fairlock.py

+                self.assertEquals(l, n)
+                # Real code would use another 'with Fairlock("test")' here but we cannot
+                # do that because it insists on having a code block as a body, which would
+                # then not be reached, causing a "Test code not fully covered" failure


Much of a muchness, but you can use # pragma: no cover

rosslagerwall · 2024-04-22T09:43:45Z

misc/fairlock/fairlock.c

+        while ((fd = accept(sock, NULL, NULL)) > -1) {
+            char buffer[128];
+
+            do {} while (read(fd, buffer, sizeof(buffer)) > 0);


If -1 is returned and errno == EINTR for some reason, I expect that it should continue the loop rather than exiting otherwise the lock might be dropped too early.

This is indeed important if one is going to handle any signals. Fortunately the only signals coming in here will be ones which terminate the process entirely :-)

TimSmithCtx requested review from MarkSymsCtx and KevinLCtx April 18, 2024 13:00

TimSmithCtx force-pushed the fairlock branch from 26c5f79 to 0517dde Compare April 18, 2024 13:01

TimSmithCtx force-pushed the fairlock branch from 0517dde to 50f4223 Compare April 18, 2024 14:49

MarkSymsCtx approved these changes Apr 18, 2024

View reviewed changes

MarkSymsCtx requested a review from rdn32 April 19, 2024 09:40

rdn32 reviewed Apr 22, 2024

View reviewed changes

rdn32 approved these changes Apr 22, 2024

View reviewed changes

rosslagerwall approved these changes Apr 22, 2024

View reviewed changes

TimSmithCtx merged commit a077ecd into xapi-project:master Apr 22, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CA-387861 Introduce fair locking subsystem #683

CA-387861 Introduce fair locking subsystem #683

TimSmithCtx commented Apr 18, 2024

MarkSymsCtx commented Apr 18, 2024

MarkSymsCtx commented Apr 18, 2024

edwintorok commented Apr 18, 2024

TimSmithCtx commented Apr 18, 2024

TimSmithCtx commented Apr 18, 2024

edwintorok commented Apr 19, 2024

TimSmithCtx commented Apr 22, 2024

rdn32 Apr 22, 2024

TimSmithCtx Apr 22, 2024 •

edited

rdn32 Apr 22, 2024

rosslagerwall Apr 22, 2024

TimSmithCtx Apr 22, 2024

CA-387861 Introduce fair locking subsystem #683

CA-387861 Introduce fair locking subsystem #683

Conversation

TimSmithCtx commented Apr 18, 2024

MarkSymsCtx commented Apr 18, 2024

MarkSymsCtx commented Apr 18, 2024

edwintorok commented Apr 18, 2024

TimSmithCtx commented Apr 18, 2024

TimSmithCtx commented Apr 18, 2024

edwintorok commented Apr 19, 2024

TimSmithCtx commented Apr 22, 2024

rdn32 Apr 22, 2024

Choose a reason for hiding this comment

TimSmithCtx Apr 22, 2024 • edited

Choose a reason for hiding this comment

rdn32 Apr 22, 2024

Choose a reason for hiding this comment

rosslagerwall Apr 22, 2024

Choose a reason for hiding this comment

TimSmithCtx Apr 22, 2024

Choose a reason for hiding this comment

TimSmithCtx Apr 22, 2024 •

edited