Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit request queue to fail fast #50

Open
alpe opened this issue Jan 9, 2024 · 1 comment
Open

Limit request queue to fail fast #50

alpe opened this issue Jan 9, 2024 · 1 comment

Comments

@alpe
Copy link
Contributor

alpe commented Jan 9, 2024

Incoming requests are queued in memory until capacity on a serving backend becomes available. This can be critical in a peak load or DoS scenarios. Instead of having this unbound, we should fail fast and reject new requests with StatusServiceUnavailable (503).
The total queue limit could be dynamic and/or fix value (due to memory limitations).

For dynamic calculations: factor * total_number_of_replicas * concurrent_requests_per_replica . The factor should be defined in context of the time required to scale up instances. I think, I saw 10x somewhere in a similar project but I can not find the number now. Would be a good start parameter to costumize for different environments.

@samos123
Copy link
Contributor

samos123 commented Jan 13, 2024

I guess if you have large requests and provide Lingo as a public service this would be a real concern. Let's assume each lingo instance can have 60k open connections max and each request is 1 MB then you would need 60GB of memory to hold those requests.
Someone that runs a large public Lingo instance might have other DDoS protections in-place on top of Lingo and in that case wouldn't need this feature (e.g. an API gateway or other software that includes such protection).

My vote would be to postpone this until we have a user that runs Lingo on a public endpoint. I am not against including this though. @nstogner your thoughts?

If you are implementing this, I would want a default of unlimited or a number so large that a user with plenty memory and no malicious actors (e.g. internal lingo) wouldn't encounter an error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants