New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(pubsub): off-by-one error leading to bundlers never being freed #4253
Conversation
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
@googlebot I signed it! |
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
@googlebot I signed it! |
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
When adding an item to the publish scheduler with message ordering enabled, a new bundler is added for this particular key. The publisher keeps an internal state of the current bundlers and is supposed to remove a bundler when all of its items have been processed. To know if the bundler is ready for cleanup, an internal counter is incremented on Add() and decremented on handle(). This counter is erroneously initialized at 2, leading to never reaching 0 and never beeing freed. This causes a memory leak when a high number of different ordering keys are used: the internal bundlers list becomes huge and can hold millions of entries.
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
@googlebot I signed it! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for submitting this PR, this is a good catch!
@mzanibelli So this fix actually uncovered an issue with our other methods (specifically |
@hongalex More mutexes can't be added for flushing as this could force a handling of the bundle I believe. Switching just bundlers to a |
Hello! Indeed I did not run the tests with the race detector enabled. |
Hey! I have attempted something which is not very clean but the general idea is: the race being on Now here is the catch: not sure what happens if we It's not very clean yet and I must admit I don't like this solution, but tests pass with the race detector so I might aswell mention it to you guys. |
I have a very rough draft of replacing everything with sync.Map and it seems to be fine (as in the tests pass). From some initial benchmarking, replacing map with sync.Map hasn't affected unordered publish performance. I'll send out a PR of this later today after benchmarking with ordered keys as well. |
I retried the |
Better solution here. |
Context
When adding an item to the publish scheduler with message ordering enabled, a new bundler is added for this particular key.
The publisher keeps an internal state of the current bundlers and is supposed to remove a bundler when all of its items have been processed.
To know if the bundler is ready for cleanup, an internal counter is incremented on Add() and decremented on handle().
Problem
This counter is erroneously initialized at 2, leading to never reaching 0 and the bundler never beeing freed.
Consequence
This causes a memory leak when a high number of different ordering keys are used: the internal bundlers list becomes huge and can keep growing - reaching millions of entries.
Remediation
Don't execute the increment statement upon bundle initialization since the internal counter is already at value 1.