-
Notifications
You must be signed in to change notification settings - Fork 91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CA-387861 Introduce fair locking subsystem #683
Conversation
Probably should have an update to https://github.com/xapi-project/sm/blob/master/mk/sm.spec.in ? |
Needs an update to .github/workflow/main.yml to repeat the python path changes |
Were you mixing byte range and whole file locks (or attempt overlapping byte range locks) previously? That one is known to have fairness issues: https://rusty.ozlabs.org/2010/08/13/fcntl-lock-starvation-and-tdb.html In python there might be a simpler way to get a process-wide fair lock though by using shared pthread mutexes (this is simple if you want to share lock with child processes and more complicated if you want to share with other processes). https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Lock Perhaps a Manager object could be used for sharing Lock objects? https://docs.python.org/3/library/multiprocessing.html#using-a-remote-manager |
None of the file-based locking systems available are "fair", as in, there is no guarantee that the first waiter will not be usurped when the lock is released by later waiters, thus suffering from lock starvation. This has been something of a problem across various attempts to wrap access to LVM, and indeed it turns out that multipath commands also need to be serialised on some of the same resources. This adds a small templated systemd service which does nothing but accept one connection at a time on a UNIX Domain socket, with a fairly long accept queue. The service is started when needed (thus every process using this mechanism needs permission to start the service), and the "Lock" is acquired by connecting to the socket. connect() on UNIX Domain sockets does not time out, and waiting connections are held in a queue, providing fairness. The python3 context manager is provided via a metaclass which checks the arguments and allows only one of each named Fairlock object to exist, so that attempts to take a lock when it is already held can raise an exception. This is not thread-safe but could probably be made so. Signed-off-by: Tim Smith <tim.smith@cloud.com>
Added. Also fixed a couple of rpmlint issues (Should not specify BuildRoot, or reference RPM_BUILD_ROOT or %buildroot in the %build section) |
Neither fcntl() nor flock() is fair. We've run into this issue more than once. I wouldn't call it a bug unless you can find somewhere any sort of fairness promise which is being violated. I know of no such promise. We might wish they were fair, but wishes are sadly not ponies...
In a single-threaded process, there is no need for locks at all. Since this code is single-threaded, the locking requirement is clearly between processes, so this won't help :-) |
If you look at the implementation it is a bit smarter than that, it uses named semaphores (in /dev/shm) https://github.com/python/cpython/blob/main/Modules/_multiprocessing/semaphore.c, so it does work between processes. |
POSIX states that "the highest priority thread that has been waiting the longest shall be unblocked" only when either SCHED_FIFO or SCHED_RR applies to the blocked process. Apart from that, I see no promise of fairness in POSIX or SYSV semaphores. And we've already been round the loop trying to manage a queue with locks. This is simpler and cleaner for our current purposes. |
self.sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) | ||
try: | ||
self.sock.connect(self.sockname) | ||
except (FileNotFoundError, ConnectionRefusedError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a comment in fairlock.c saying that we'd also get this error if there were more than 64 waiters. I assume that trying to start the service when it is already running would be harmless, but we are only retrying the connection once. I take it that in reality we wouldn't expect to have that many waiters, so there's no point doing anything more involved. Yes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Local experimentation suggests that python's connect() handles the case of >64 waiters on the client side quite comfortably. While connecting to a Unix domain socket when the listen queue is full is allowed to return ECONNREFUSED, in practice on linux it is EAGAIN, which python handles for us.
The check for ECONNREFUSED here is mostly to cover the one-time-only race condition where we are the second client in, and the service has been started by the first client in but has not quite got around to listening yet.
self.assertEquals(l, n) | ||
# Real code would use another 'with Fairlock("test")' here but we cannot | ||
# do that because it insists on having a code block as a body, which would | ||
# then not be reached, causing a "Test code not fully covered" failure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much of a muchness, but you can use # pragma: no cover
while ((fd = accept(sock, NULL, NULL)) > -1) { | ||
char buffer[128]; | ||
|
||
do {} while (read(fd, buffer, sizeof(buffer)) > 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If -1 is returned and errno == EINTR
for some reason, I expect that it should continue the loop rather than exiting otherwise the lock might be dropped too early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is indeed important if one is going to handle any signals. Fortunately the only signals coming in here will be ones which terminate the process entirely :-)
None of the file-based locking systems available are "fair", as in, there is no guarantee that the first waiter will not be usurped when the lock is released by later waiters, thus suffering from lock starvation. This has been something of a problem across various attempts to wrap access to LVM, and indeed it turns out that multipath commands also need to be serialised on some of the same resources.
This adds a small templated systemd service which does nothing but accept one connection at a time on a UNIX Domain socket, with a fairly long accept queue. The service is started when needed (thus every process using this mechanism needs permission to start the service), and the "Lock" is acquired by connecting to the socket. connect() on UNIX Domain sockets does not time out, and waiting connections are held in a queue, providing fairness.
The python3 context manager is provided via a metaclass which checks the arguments and allows only one of each named Fairlock object to exist, so that attempts to take a lock when it is already held can raise an exception. This is not thread-safe but could probably be made so.