kiwix-serve ZIM fd needs to be smarter #142

kelson42 · 2017-12-31T20:41:46Z

Current, kiwix-serve opens all the ZIM files specified at start and keeps a fd open for each of them (without counting the potential fd for Xapian). By doing this, we can be sure that at any time we can directly read a content of any ZIM file quickly - without initialising the ZIM reader.

The problems are:

Per default a Linux system can usually only open 1024 file description per process
The memory consumption can be quite high even if nobody access anything.

That's why I propose to:

Limit the size of the pool of ZIM file descriptor (per default I propose 500)
Make this configurable in with a command line parmater
Implement the pool with LRU algorithm
Load dynamically ZIM files if needed

This ticket is linked to #143

kelson42 · 2019-04-16T19:25:28Z

@mgautierfr You have work on this - or somthing related - today. Any feedback about this ticket?

kelson42 · 2020-08-15T15:22:34Z

We are hitting the problem of too many fd open:

Internal Server Error

An internal server error occured. We are sorry about that :/

error 24 opening file "/var/www/library.kiwix.org/zim/wiktionary/wiktionary_ru_all_maxi_2020-07.zim

mgautierfr · 2020-08-17T09:18:37Z

With #353 we are not opening the zim file at startup.

We only create the reader when we need it but :

multi-zim search will open all books.
we have no limits on the cache size.

kelson42 · 2020-08-17T09:23:17Z

@mgautierfr This is a good precision you made here indeed. We try as well to fix the problem at the system level, cf. kiwix/container-images#148. But this is only temporary and for us only.

kelson42 · 2021-12-05T13:05:25Z

@mgautierfr @veloman-yunkan I think it is time here to move on this. Beside what has already been reported as problems, the goal is to limit the overall/optimize the overall memory usage. Somehow the concept of pool and pool size might be needed. Regarding the multizim search, it is quite clear to me that, to keep it usable, we will have to limit this function to a subset of the available zim files. Maybe this maximal subset side should be the same as the pool size (or a derivative of it)?

Anyway, we need to clairfy the approach before implementing it.

kelson42 · 2021-12-11T13:42:56Z

@mgautierfr told me there is a possibility that kiwix/libkiwix#620 just solves the problem by introducing a pool of active readers.

@veloman-yunkan @maneeshpm @veloman-yunkan Would be great to confirm before starting anything else on this ticket.

kelson42 · 2022-10-24T19:25:06Z

@mgautierfr @veloman-yunkan I wonder if such an improvement would really bring much now that multizim search works fine. What do you think?

veloman-yunkan · 2022-10-31T09:54:39Z

I believe the following discussion in the scope of kiwix/libkiwix#729 is relevant to this ticket:

kiwix/libkiwix#729 (review)
kiwix/libkiwix#729 (comment)
kiwix/libkiwix#729 (comment)

kelson42 · 2022-11-02T16:55:21Z

@veloman-yunkan I’m not sure what to think or what to propose. Do you think something should/could be done? If « yes », what and why?

kelson42 · 2022-11-09T06:41:03Z

@veloman-yunkan ?

veloman-yunkan · 2022-11-17T07:49:08Z

@kelson42 I stick to my previous comment:

I think that the main requirement must be ensuring that the same ZIM file is never opened more than once at the same time, otherwise it will result in additional pressure on RAM by the cluster caches (since the searcher isn't limited to reading the xapian database via a direct reference but also accesses article content during snippet generation). To this end the archive cache system must keep track of all zim::Archives that are currently alive. Then we can think how to build the searcher cache on top of the archive cache.

mgautierfr · 2023-02-10T17:35:36Z

After discussion:

Most of the time, there is only 2 or 3 fd open per zim files (one for the archive, one for the fulltext search database and one for the suggestion search database)
When doing multizim search, we add another opened fd per zim searched (fulltext database are not shared between searcher). This is not a easy problem as fulltext database are not threadsafe. So if we share the database, we have to lock other searchers sharing the same database.
As Searcher keep a reference to the archive, we may have situation when we drop the archive from the cache but keep the searcher in it. When we want to open the archive again, we don't find it in the cache and so we open it again (and so we have two opened archive for the same zim file)
By default the cache is calculated to a 1/10 of the number of books. So if you have a small number of books, we only have a cache of one book.

So we have to do:

Avoid to open several times a Archive (fix point 3)
Set a minimum cache size to something more that 1 (10 is probably a good number)

The point 2 seems a bit tricky about the threadsafe lock. Not sure we want to do that now. (Unless you have already a idea of how to fix that)

kelson42 · 2023-09-30T13:15:21Z

@mgautierfr Thank you for the summary and the action plan. This is ready to implement.

kelson42 added enhancement kiwix-serve labels Dec 31, 2017

kelson42 mentioned this issue Dec 31, 2017

kiwix-serve index page computation is too expensive #143

Closed

kelson42 assigned mgautierfr Apr 16, 2019

kelson42 mentioned this issue Aug 19, 2020

Optimization of zim::Cache openzim/libzim#385

Closed

kelson42 added this to To do in Better library Oct 29, 2020

kelson42 unassigned mgautierfr Mar 6, 2021

kelson42 pinned this issue Aug 20, 2021

kelson42 moved this from To do to In progress in Better library Dec 5, 2021

kelson42 assigned veloman-yunkan Dec 5, 2021

kelson42 assigned mgautierfr Dec 13, 2021

kelson42 added this to the 3.3.0 milestone Dec 28, 2021

kelson42 modified the milestones: 3.3.0, 3.4.0 Jun 2, 2022

kelson42 modified the milestones: 3.3.1, 3.4.0 Sep 24, 2022

kelson42 added the question label Oct 24, 2022

kelson42 modified the milestones: 3.5.0, 3.4.0 Oct 29, 2022

kelson42 modified the milestones: 3.4.0, 3.5.0 Nov 17, 2022

kelson42 modified the milestones: 3.5.0, 3.6.0 Feb 10, 2023

kelson42 unassigned mgautierfr Sep 30, 2023

kelson42 modified the milestones: 3.6.0, 3.7.0 Sep 30, 2023

kelson42 modified the milestones: 3.7.0, 3.8.0 Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kiwix-serve ZIM fd needs to be smarter #142

kiwix-serve ZIM fd needs to be smarter #142

kelson42 commented Dec 31, 2017 •

edited

kelson42 commented Apr 16, 2019

kelson42 commented Aug 15, 2020

mgautierfr commented Aug 17, 2020

kelson42 commented Aug 17, 2020

kelson42 commented Dec 5, 2021 •

edited

kelson42 commented Dec 11, 2021

kelson42 commented Oct 24, 2022 •

edited

veloman-yunkan commented Oct 31, 2022

kelson42 commented Nov 2, 2022

kelson42 commented Nov 9, 2022

veloman-yunkan commented Nov 17, 2022

mgautierfr commented Feb 10, 2023

kelson42 commented Sep 30, 2023

kiwix-serve ZIM fd needs to be smarter #142

kiwix-serve ZIM fd needs to be smarter #142

Comments

kelson42 commented Dec 31, 2017 • edited

kelson42 commented Apr 16, 2019

kelson42 commented Aug 15, 2020

mgautierfr commented Aug 17, 2020

kelson42 commented Aug 17, 2020

kelson42 commented Dec 5, 2021 • edited

kelson42 commented Dec 11, 2021

kelson42 commented Oct 24, 2022 • edited

veloman-yunkan commented Oct 31, 2022

kelson42 commented Nov 2, 2022

kelson42 commented Nov 9, 2022

veloman-yunkan commented Nov 17, 2022

mgautierfr commented Feb 10, 2023

kelson42 commented Sep 30, 2023

kelson42 commented Dec 31, 2017 •

edited

kelson42 commented Dec 5, 2021 •

edited

kelson42 commented Oct 24, 2022 •

edited