mod_tile appears to require a single-process (but can handle multiple threads) Apache server to work correctly #238

alankila · 2021-06-04T10:37:54Z

I was struggling for some days with the problem that under heavy load, mod_tile seemed to not return a tile promptly to waiting http client connection. The tile was in fact rendered, but the http connection appeared to be stuck waiting for acknowledgement from render, I think.

I read the code a bit and came across this comment in mod_tile/includes/protocol.h:

A client may not bother waiting for a response if the render daemon is too slow
causing responses to get slightly out of step with requests.

This gave me cause to investigate the possibility that in a multiprocessing apache, multiple independent http server processes have reached renderd and are writing commands to render tiles and read for response. My hypothesis is that there is a single socket shared by all connected renderd clients, and in that case, if the server that did not submit the request to render a tile reads the response, then it simply throws it away as it can't notify another process's thread, and the Apache process whose thread did submit that request gets stuck waiting for it until something timeouts.

I have no conclusively proven that this is indeed what is happening, but I got rid of my slowly rendered tiles by enabling mod_event and setting it up in such a way that it starts only 1 process, and ThreadLimit, MaxRequests, etc. were all set to 150 so that apache will not fork more than 1 worker process. I presume that as long as there is only 1 apache process, when it serves multiple threads, it is able to discover which thread is waiting for the tile rendering acknowledgement.

It is also worth noting that my mod_tile configuration is as follows:

ModTileTileDir /var/cache/renderd/tiles
LoadTileConfigFile /etc/renderd.conf
ModTileEnableStats On
ModTileRequestTImeout 60
ModTileMissingRequestTimeout 60
ModTileRenderdSocketName /run/renderd/renderd.sock

i.e. I have set a long RequestTimeout and MissingRequestTimeout. My clients are willing to wait for as long as it takes for the tile to come, instead of a shorter duration. It may be that the shorter 3 and 10 second timeout values somewhat mask the problem. I think that request does return with the tile's data if the tile gets rendered by renderd before the timeout, it just takes unnecessarily long time to get the tile. In my case, the timeout was excessively long and thus I was motivated to try to figure out why it kept happening.

The text was updated successfully, but these errors were encountered:

alankila · 2021-06-04T10:47:12Z

Also happy to report that this setup repairs the oddly low cpu usage I have seen with renderd. If I set it to use 8 threads and scroll around in unseen map region, then I get 800 % CPU usage of the renderd process.

xamanu · 2021-08-04T10:50:27Z

It would be nice to document your findings.

stephankn · 2021-08-09T05:58:32Z

@alankila Do I get it right that the render socket misses a session handling? I would like to understand your problem analysis better.

here the server opens a unix domain socket:

mod_tile/src/daemon.c

Line 532 in f28cda9

fd = socket(PF_UNIX, SOCK_STREAM, 0);

PF_UNIX, SOCK_STREAM. To my understanding, the following accept calls will lead to a unique combination of file descriptors managed by the kernel.

mod_tile/src/daemon.c

Line 237 in f28cda9

incoming = accept(listen_fd, (struct sockaddr *) &in_addr, &in_addrlen);

The man-page confirms that with SOCK_STREAM you have a connection. This means well known communication partners.

https://man7.org/linux/man-pages/man2/socket.2.html

   SOCK_STREAM
          Provides sequenced, reliable, two-way, connection-based
          byte streams.

So at least on socket/connection level the rendering daemon should know well which response belongs to which request.

the loop calls rx_request to process a command:

mod_tile/src/daemon.c

Line 276 in acb1180

enum protoCmd rsp = rx_request(&cmd, fd);

This one knows about connection details in a data structure item

mod_tile/includes/gen_tile.h

Line 30 in acb1180

struct item {

So I do not yet understand where the connection detail should get lost. The client sending out a request should get exactly their answer back, as the connection is maintained in the item structure.

Can you provide a maybe synthetic reproduction case which shows the problem? You mentioned that the "missing" requests had actually completed. So we can exclude error cases? What environment are you running? Do you have AppArmor/SELinux rules in place?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mod_tile appears to require a single-process (but can handle multiple threads) Apache server to work correctly #238

mod_tile appears to require a single-process (but can handle multiple threads) Apache server to work correctly #238

alankila commented Jun 4, 2021 •

edited

alankila commented Jun 4, 2021

xamanu commented Aug 4, 2021

stephankn commented Aug 9, 2021

mod_tile appears to require a single-process (but can handle multiple threads) Apache server to work correctly #238

mod_tile appears to require a single-process (but can handle multiple threads) Apache server to work correctly #238

Comments

alankila commented Jun 4, 2021 • edited

alankila commented Jun 4, 2021

xamanu commented Aug 4, 2021

stephankn commented Aug 9, 2021

alankila commented Jun 4, 2021 •

edited