listener backlog limit exceeded from a malicious user #878

na-Itms · 2024-02-29T13:39:33Z

Hello Graham, thanks for your fantastic work on mod_wsgi.

I am facing a usage issue where I would need some input on my setup. We are running a Python application (namely, Trac) behind Apache with mod_wsgi in daemon mode. This application is aimed at a fairly low amount of users (we expect at most 40-ish unique users using the website simultaneously).

Unfortunately, we are being targeted by a malicious user who is repeatedly accessing Trac, especially its slowest-loading paths. This causes mod_wsgi to fail (in under an hour after restarting Apache) with the following:

(11)Resource temporarily unavailable: mod_wsgi: Unable to connect to WSGI daemon process 'trac' on '/var/run/apache2/wsgi.redacted.sock' after multiple attempts as listener backlog limit was exceeded.

The server itself is far from being overwhelmed, CPU and memory usage remain low. Other applications served by Apache keep functioning normally.

I have successfully set up mod_qos to limit the request rate from attackers (I correctly see the attacker being limited in the logs). However, this doesn't prevent mod_wsgi's backlog limit from being exceeded. Thus, I assume that my WSGI daemon configuration is faulty, as the current rate of requests could be a reasonable rate for a service run on a larger scale.

I have increased net.core.somaxconn to 1024, and I currently run the following mod_wsgi configuration:

WSGIDaemonProcess trac threads=8 processes=2 listen-backlog=1024 queue-timeout=30 graceful-timeout=30 maximum-requests=1000
WSGIProcessGroup trac

I do not fully understand how the backlog limit can be hit when the maximum number of requests is set below the backlog depth.
I think I understand that, since the attacker keeps sending requests when the backlog is full, this situation becomes unrecoverable. However, I would have expected queue-timeout to alleviate the issue.

I have added restart-interval=1800 as a temporary workaround, but that doesn't fix anything.

Thanks in advance for your help and input, in the hope that the correct fix can also be useful for future users.

The text was updated successfully, but these errors were encountered:

GrahamDumpleton · 2024-02-29T20:20:52Z

Unfortunately queue timeout only discards the requests in flight after they get through the listen backlog and timeout had been exceeded. There is no way to cancel a request while stuck in the listen backlog.

Only thing I can think of immediately is to run multiple daemon process groups and separate out problematic long running request URLs so they are handled by a separate daemon process group which can be configured differently.

http://blog.dscpl.com.au/2014/02/vertically-partitioning-python-web.html

With them separated you may be able to use request-timeout, but that is a rather brute force method of forcing a restart of the daemon process when overloaded and may affect normal usage if long running requests is a normal thing.

Also, increasing the listen backlog may actually be the wrong thing to do and could just exacerbate things. It may be better to use a smaller listen backlog and also play around with connect-timeout and queue-timeout together.

Because WSGI uses a sync model and relies on multi threading, nothing that can be easily done to handle very large numbers of concurrent requests.

Anyway, will have a think about it and see if there is anything else I can suggest.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

listener backlog limit exceeded from a malicious user #878

listener backlog limit exceeded from a malicious user #878

na-Itms commented Feb 29, 2024

GrahamDumpleton commented Feb 29, 2024

listener backlog limit exceeded from a malicious user #878

listener backlog limit exceeded from a malicious user #878

Comments

na-Itms commented Feb 29, 2024

GrahamDumpleton commented Feb 29, 2024