Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BoundedQueue improvement #243

Open
ksergey opened this issue Feb 6, 2023 · 4 comments
Open

BoundedQueue improvement #243

ksergey opened this issue Feb 6, 2023 · 4 comments

Comments

@ksergey
Copy link

ksergey commented Feb 6, 2023

Hi!

Instead of allocating x2 memory inside BoundedQueue you could mmap(...) the same memory region twice:

  std::size_t const size = ...;
  int fd = memfd_create(...);
  ftruncate(fd, size);

  auto address = ::mmap(nullptr, size << 1, PROT_NONE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
  if (address == MAP_FAILED) {
    throw ...;
  }
  data_ = static_cast<char*>(address);

  address = ::mmap(data_, size, PROT_READ | PROT_WRITE, MAP_FIXED | MAP_SHARED, fd, 0);
  if (address != data_) {
    throw ...;
  }

  address = ::mmap(data_ + size, size, PROT_READ | PROT_WRITE, MAP_FIXED | MAP_SHARED, fd, 0);
  if (address != data_ + size) {
    throw ...;
  }

  capacity_ = size;
  mask_ = capacity_ - 1;

The technique called magic ring buffer.

BTW nice logger!

Thanks

@odygrd
Copy link
Owner

odygrd commented Feb 6, 2023

Hey, thanks for checking out the library and the queue.

Below is a short history of quill queues ..

I actually had that proposed implementation in earlier versions of the library and I also had some cross platform code for it. The code can be found here:

https://github.com/odygrd/quill/blob/v1.5.0/quill/include/quill/detail/BoundedSPSCQueue.h
https://github.com/odygrd/quill/blob/v1.5.0/quill/src/detail/misc/Os.cpp#L482

The issues I had with shm were :

  • On macos there is no /dev/shm unless you explicitly create it, so the queue was using /tmp.
  • The logger would not work with Android SDK because it couldn't create those files there. I could not find any way to make the logger work.
  • On linux to perform that kind of mapping you have to use MAP_SHARED. MAP_PRIVATE won't work because the OS has to sync the shm file. I am not exactly sure of the performance cost of MAP_SHARED but when I was benchmarking it a few years ago I was getting a higher latency at like 99.9th percentile as page fault but I was never able to figure out how to eliminate that. I had tried many different profilers and things including vtune. Then when I later moved to post v1.5.0 queue that issue was gone.

Because of the above issues, I then moved to an implementation where the tail and the head would just wrap around even for a non power of 2 size. (https://github.com/odygrd/quill/blob/v2.6.0/quill/include/quill/detail/spsc_queue/BoundedQueue.h). That queue worked quite well and showed good performance even for 99.9th percentiles. The drawbacks were that code was more complicated and also it required more branches.

For version v2.7.0 I decided to add the -DQUILL_X86ARCH flag which tries to minimise cache pollution using _mm_prefetch , _mm_clflush and _mm_clflushopt. With the previous v2.6.0 queue calculating the cache lines that are required to pass to those instructions would lead to even more complicated code (due to _end_of_recorded_space for example) so I moved back to the previous v1.5.0 implementation without using shm this time. Unfortunately I do not have access to the same system that I used for v1.5.0 with the mmap trick but the v2.7.0 benchmarks on the current system seemed all good.

Having said that it might worth it at some point to benchmark the current v2.7.0 queue which allocates twice the memory with the old mmap queue on the same system. That might show if MAP_SHARED has any performance implications. If it shows absolutely the same performance then probably it would be nice for example for linux/windows to use the mmap trick and for non supported platforms to revert to allocating twice the memory.

@takohack
Copy link

takohack commented Sep 4, 2023

Hey,Bro. I am curious about Why autor need allocating x2 memory inside BoundQueue? Can we only allocate x1 memory?

@ksergey
Copy link
Author

ksergey commented Sep 4, 2023

@takohack if you about master version of BoundedQueue I don't known. Also there is unnecessary check of variable which already is pow2.

I think it's not finished refactoring.

@odygrd ^^

@odygrd
Copy link
Owner

odygrd commented Oct 31, 2023

thanks for letting me know. I fixed it here and also made some improvements. It's mainly cosmetic changes and small things.

https://github.com/odygrd/quill/pull/362/files

Throughput is about the same as old version:

old: Throughput is 2.21 million msgs/sec average, total time elapsed: 1809 ms for 4000000 log messages

new: Throughput is 2.24 million msgs/sec average, total time elapsed: 1787 ms for 4000000 log messages

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants