Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LeakyBucket does not release the Semaphore if a request fails or the task is canceled #1055

Open
nozzlegear opened this issue Apr 16, 2024 · 0 comments

Comments

@nozzlegear
Copy link
Owner

nozzlegear commented Apr 16, 2024

I've run into an interesting problem in production, where an app was suddenly swamped with thousands of calls to the Product Updated webhook endpoint, all for the same store. The LeakyBucketExecutionPolicy dutifully put all the requests into a leaky bucket, but it just couldn't get through them fast enough before Shopify begin to cancel the requests once they hit that 5 second mark.

The problem: the internal LeakyBucket itself does not wrap the semaphore.WaitAsync in a try/catch block, which means any task cancelation exception thrown here will cause the semaphore to not be released. Additionally, the request will not be removed from the queue.

The effect is that Shopify will most likely retry the webhook immediately, adding another request to the queue. The semaphore sees that the previous request is still there, so the new one needs to wait. This wouldn't be a problem for a low-volume endpoint, eventually this would all shake out and the semaphore would empty itself with a few exception messages about canceled tasks in your logs.

But with a high-volume endpoint where you're receiving hundreds of requests per second, the problem just compounds itself and the requests will sit in the LeakyBucket queue for 10, 15 or 30 minutes before their chance to be processed comes up. At this point, the only thing that can save them is Shopify's exponential backoff, where we wait for Shopify to stop trying the endpoints so quickly (and pray that more events aren't generated in the meantime).

The fix for this is to check whether the cancellation token has been canceled before we try to have the semaphore wait on it. If so, the request should be dequeued and skipped. Additionally, we should wrap the semaphore usage in a try/finally, then release the semaphore and dequeue the request if the request is cancelled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant