Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to make a warm-up of lower/upper KB databases at startup #143

Open
oterrier opened this issue May 17, 2022 · 1 comment
Open

Comments

@oterrier
Copy link

Having most of lmdb pages loaded in memory speed up a lot the processing even though it requires a lot of RAM.
Maybe we could add an option (in the config files) or a REST end point to force the full (almost full?) loading of some lmdb databases ofr a given languages (the ones that are required to do a disambiguation):

  • pages db ?
  • concepts db?
  • labels db ?
    This should be as simple as opening a cursor on every lmdb db and iterate till the end.

What do you think ?

Best regards

Olivier Terrier

@oterrier
Copy link
Author

Hi Patrice,

Not sure if we should preload the whole dbs (too much memory involved) or just a subset like the N most frequently used entries ?
Looking at the code of com.scienceminer.nerd.utilities.WikipediaLabelIDF I can see that you already have the occurence count stored in the LabelDatabase so for this one it should be easy

But don't know how to proceed for the others (PageDb, etc...)

Maybe a persistant EHCache with an LFU policy could be associated to every KBDatabase so that the N most frequent entries (just the key anyway) could be stored and retrieved at startup ?

Just some thoughts

Best regards

Olivier

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant