Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enqueue slower than without #1118

Closed
nicholas-a-guerra opened this issue Apr 2, 2024 · 3 comments
Closed

Enqueue slower than without #1118

nicholas-a-guerra opened this issue Apr 2, 2024 · 3 comments
Labels
question Further information is requested

Comments

@nicholas-a-guerra
Copy link

I understand this library is not focused on performance but I was just curious if I'm misunderstanding how enqueue works. Based on my understanding it essentially just puts the log into a queue and either handles it in another thread or async task (maybe?). Then this should minimize block time for the logger.info call or similar. However, interestingly I'm seeing some odd results when benchmarking. Here's a script I'm running:

from time import perf_counter
from loguru import logger
import asyncio
import sys

async def main():
    base_loguru_times = base_loguru()
    enqueue_loguru_times = enqueue_loguru()
    
    print_analysis("Base Loguru", base_loguru_times)
    print_analysis("Enqueue Loguru", enqueue_loguru_times)

def base_loguru():
    times = []
    for _ in range(1000):
        start = perf_counter()
        logger.warning("Test")
        times.append(perf_counter()-start)
    return times

def enqueue_loguru():
    logger.remove()
    logger.add(sys.stdout, enqueue=True)
    times = []
    for _ in range(1000):
        start = perf_counter()
        logger.warning("Test")
        times.append(perf_counter()-start)
    logger.complete()
    return times


def print_analysis(test_name, times):
    min_time = min(times)
    max_time = max(times)
    avg_time = sum(times) / len(times)
    print('-'*50)
    print(test_name)
    print(f"Min: {min_time:.4f} | Max: {max_time:.4f} | Avg: {avg_time:.4f}")
    

if __name__ == "__main__":
    asyncio.run(main())

When running this, I am consistently seeing results like this:

--------------------------------------------------
Base Loguru
Min: 0.0003 | Max: 0.0088 | Avg: 0.0005
--------------------------------------------------
Enqueue Loguru
Min: 0.0004 | Max: 0.1264 | Avg: 0.0022

You can see that oddly enough enqueue is taking longer than without enqueue. I guess I'm trying to understand what I might be missing or if this is a bug. My first thought is maybe there's some overhead for the enqueue up front but if the handler is very io intense then maybe there will still be a net gain. But then I thought, why would there be overhead for a queue. Surely, no matter what adding to a queue should be faster than even printing to stdout right?

@Delgan
Copy link
Owner

Delgan commented Apr 2, 2024

Thanks for this insightful benchmark, @nicholas-a-guerra.

Firstly, I've implemented some minor adjustments to your script in order to improve its accuracy:

  • I synchronized the handlers for base_loguru() and enqueue_loguru(), as otherwise they were tested using different sinks, respectively sys.stderr (default handler) and sys.stdout (test handler).
  • I introduced a "warmup" call prior to taking measurements, as it's not uncommon for the logger to defer initialization of internals upon receiving the first message.
  • I increased the number of iterations between each measurement to ensure more reliable results.
  • I removed the usage of asyncio, as it was introducing unnecessary context switching overhead.
from time import perf_counter
from loguru import logger
import sys

def main():
    base_loguru_times = base_loguru()
    enqueue_loguru_times = enqueue_loguru()
    
    print_analysis("Base Loguru", base_loguru_times)
    print_analysis("Enqueue Loguru", enqueue_loguru_times)

def base_loguru():
    logger.remove()
    logger.add(sys.stderr, enqueue=False)
    logger.warning("Init")
    times = []
    for _ in range(1000):
        start = perf_counter()
        for _ in range(100):
            logger.warning("Test")
        times.append(perf_counter()-start)
    return times

def enqueue_loguru():
    logger.remove()
    logger.add(sys.stderr, enqueue=True)
    logger.warning("Init")
    times = []
    for _ in range(1000):
        start = perf_counter()
        for _ in range(100):
            logger.warning("Test")
        times.append(perf_counter()-start)
    logger.complete()
    return times


def print_analysis(test_name, times):
    min_time = min(times)
    max_time = max(times)
    avg_time = sum(times) / len(times)
    print('-'*50)
    print(test_name)
    print(f"Min: {min_time:.4f} | Max: {max_time:.4f} | Avg: {avg_time:.4f}")
    

if __name__ == "__main__":
    main()

With this script, I get these results on my machine (using 70e9df2 ref commit):

$ python bench.py 2>/dev/null
--------------------------------------------------
Base Loguru
Min: 0.0036 | Max: 0.0054 | Avg: 0.0038
--------------------------------------------------
Enqueue Loguru
Min: 0.0083 | Max: 0.0167 | Avg: 0.0098

That's pretty bad. Using enqueue=True is around 3 times slower on average.

Why that? The main reason is that enqueue=True was never aimed for performances. Its primary purposes are asynchronous logging (being able to schedule a message without blocking) and multiprocessing safety (being able to log from multiple processes to the same sink). Consequently, some design choices were possibly made to the detriment of performance. One prominent factor contributing to the slowness is likely the direct usage of a multiprocessing.SimpleQueue which implies noticeable overhead due to locking mechanisms, as opposed to a basic collections.deque.

That being said, I recently refactored the enqueue=True implementation which coincidentally removed direct usage of SimpleQueue. Better performances should be observed with these changes, but they are not yet merged on the master branch. Here are the result running the same script with my in-progress work (using be6ea9a ref commit)

python bench.py 2>/dev/null
--------------------------------------------------
Base Loguru
Min: 0.0037 | Max: 0.0059 | Avg: 0.0038
--------------------------------------------------
Enqueue Loguru
Min: 0.0037 | Max: 0.0719 | Avg: 0.0045

Much better, isn't it? But still slower than the base with enqueue=False...

This time, a second player comes into play: the GIL. You see, Loguru implements enqueue=True using a auxiliary thread in charge of processing in the background the messages queued by the main thread. We therefore expect these two tasks to run in parallel with each other. Alas, there's no true parallelism using Python threads. The problem is that the aforementioned GIL prevents parallel execution of Python code in the same process, despite usage of multiple threads. The Python runtime will therefore alternate between one thread and the other, sequentially.

The consequence of the above is that two tasks running in two threads will always be slower than if they were executed in a single thread. The overhead we're observing here when enqueue=True is likely caused by the Python interpreter pausing the main thread to execute parts of the code intended for the background thread.

As of today, it's not possible to entirely remove the GIL although although there have been very recent changes in that direction. That's why we shouldn't assume that enqueue=True will help with an application's performance. It should be used instead to avoid blocking the thread with slow sinks, or to log into the same file from several processes.

@Delgan Delgan added the question Further information is requested label Apr 2, 2024
@nicholas-a-guerra
Copy link
Author

Thanks for the immediate and very thorough feedback! All of what you shared clears up several aspects. Especially with the adjustment made to the queue usage, those new benchmark numbers look much more expected. I've unfortunately had to go down the rabbit hole of logging performance due to some time sensitive code. But as everyone's already aware, Python is probably the first problem when trying to speed things up, and not third party libraries. But thought it was an interesting benchmark nonetheless.

It seems in my case there may never be a perfect solution. Perhaps I could even just write to stdout, pipe to a file, and then parse the file in another process. Oh the joy of no true parallelism in python... Regardless, that clears up the details on my end. I appreciate the help!

@Delgan
Copy link
Owner

Delgan commented Apr 3, 2024

The best you can do is wait for GIL to be deactivated, or for the critical parts of Loguru to be re-implemented in C. Not sure which will come first... 😬

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants