Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logging lock not released when held by OS threads #2013

Open
jaklaassen-affirm opened this issue Jan 22, 2024 · 3 comments
Open

Logging lock not released when held by OS threads #2013

jaklaassen-affirm opened this issue Jan 22, 2024 · 3 comments

Comments

@jaklaassen-affirm
Copy link

jaklaassen-affirm commented Jan 22, 2024

  • gevent version: 23.9.1, installed from PyPi via Poetry
  • Python version: cPython 3.9.18 installed via pyenv
  • Operating System: OSX Sonoma 14.2

Description:

In a monkey-patched environment, I'm seeing that when the main thread attempts to acquire a logging lock held by an OS thread, it will wait long after the OS thread releases the lock. I have observed this waiting up to 5 hours before successfully acquiring the lock.

I believe this is a complete repro case:

import logging
import time
import gevent
import gevent.monkey

gevent.monkey.patch_all()

pool = gevent.get_hub().threadpool

class PausingHandler(logging.StreamHandler):
    def emit(self, record):
        time.sleep(1)
        return super(PausingHandler, self).emit(record)
    
logging.basicConfig(level=logging.INFO, handlers=[PausingHandler()], format="%(message)s")

logger = logging.getLogger("gevent_test")
logger.info("STARTING")

def log_async(msg):
    while True:
        logger.info(msg)
        time.sleep(1)

def main():
    thread = pool.apply_async(log_async, ["IN_THREAD"])
    greenlet = gevent.spawn(log_async, "IN_GREENLET")
    
    thread.join()
    
if __name__ == "__main__":
    main()

Which produces this output:

IN_GREENLET
IN_THREAD
IN_THREAD
IN_THREAD
...

Following the suggestion in https://www.gevent.org/changelog.html#id30, I added gevent.get_hub() to the top of log_async, which seems to have fixed the problem.

Thanks for all your hard work on gevent! Its an amazing achievement.

@jamadden
Copy link
Member

Sorry, I don't have any better solution than that (gevent can't do that itself because of the way locks are used in the interpreter; and besides, multiple hubs are generally not a good thing). I do see a possible improvement in changing from a spinlock to native timeouts (deep in the internals) now that all supported Python versions can do that, but I doubt that would make much of a difference.

Unless otherwise documented, most lock implementations aren't guaranteed to be fair. That holds true for gevent locks , and they're even less fair across threads.

Mixing threads and greenlets and locks is a very very delicate, tricky thing. If at all possible, only do computation in a thread, don't try to interact with shared gevent objects (such as locks) from two different threads. Honestly, IIRC, the fact that I was able to get any of that working at all was something of a minor miracle.

jamadden added a commit that referenced this issue Feb 13, 2024
@jamadden
Copy link
Member

Hmm. Interestingly, disabling most of gevent's C extensions suddenly makes the threads ping-pong back and forth. Off the top of my head, I'm not sure why that would be.

$ PURE_PYTHON=1 python /tmp/example.py
STARTING
IN_GREENLET
IN_THREAD
IN_GREENLET
IN_THREAD
IN_GREENLET
IN_THREAD
IN_GREENLET
IN_THREAD
IN_GREENLET
IN_THREAD

@jaklaassen-affirm
Copy link
Author

jaklaassen-affirm commented Mar 2, 2024

Thanks for the response and sorry to go silent, been kinda slammed. Our specific use case is with Confluent's Kafka library, which starts threads from C that call back into Python callbacks for logging. I should really look into whether they can just call those callbacks from the main thread. That would help Gevent users, of which there must be many.

This gave me a chance to dig into the Gevent internals, which was at least fun for me. I see the pain and the problem.

I made a simpler repro case that also repros with PURE_PYTHON=1:

import threading
import gevent
import gevent.monkey
gevent.monkey.patch_all()

hub = gevent.get_hub()
pool = hub.threadpool

def thread_fn(evt, lock):
    print("acquire in thread")
    lock.acquire()
    print("evt.wait() in thread")
    while not evt.is_set():
        gevent.sleep(1)
    evt.wait()
    print("lock.release() in thread")
    lock.release()

def greentlet_fn(evt, lock):
    res = lock.acquire()
    print(f"lock.acquire() in greenlet = {res}")    

def main():
    evt = threading.Event()
    lock = threading.Lock()
    
    thread = pool.apply_async(thread_fn, [evt, lock])
    gevent.sleep(1)
    print("Starting greenlet")
    greenlet = gevent.spawn(greentlet_fn, evt, lock)
    gevent.sleep(1)
    print("Setting event")
    evt.set()
    thread.join()
    gevent.wait([greenlet], timeout=5)
    print(f"greenlet is ready={greenlet.ready()}")
    
if __name__ == "__main__":
    main()

Adding

lock.acquire()
lock.release()

before gevent.wait([greenlet], timeout=5) makes the greenlet finish, but reading the code its obvious why, because releasing the lock on the main thread notifies the links on the main thread. We could not switch to those when we notified from the thread.

I do wonder if Semaphore._handle_unswitched_notifications could be written to do what AbstractLinkable._handle_unswitched_notifications does, but rather than calling hub.loop.run_callback_threadsafe(link, self), call hub.loop.run_callback_threadsafe(AbstractLinkable._check_and_notify, self). I can see the potential for an infinite loop there, which might be bad, and of course the possibility of all sorts of problems that I can't see.

Honestly, IIRC, the fact that I was able to get any of that working at all was something of a minor miracle.

Now that I've really read the code, I completely and totally agree 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants