Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for LRU with expire #18

Open
rohitjoshi opened this issue Dec 13, 2019 · 3 comments
Open

Support for LRU with expire #18

rohitjoshi opened this issue Dec 13, 2019 · 3 comments

Comments

@rohitjoshi
Copy link

Thanks for the high performance sharedhashfile. Any plans to support LRU with expire similar to Nginx shared cache ?

@rohitjoshi rohitjoshi changed the title support for LRU with expire Support for LRU with expire Dec 13, 2019
@simonhf
Copy link
Owner

simonhf commented Jan 23, 2020

Hi Rohit and thanks for the request.

Regarding expire:

sharedhashfile has been used to passive / lazy expire (expire upon read) keys, and this can be achieved by the caller by embedding some kind of timestamp in the value.

Regarding LRU:

This could be achieved by the caller via embedding a double linked list in the value, so that all keys are always in chronological order. Then it'd be easy to find the least recently used key and delete it. But this algorithm would be bad for size and performance. All values would get larger. And there would be likely be a locking performance hit, although perhaps this could be eliminated by limiting the LRU double linked list to an individual SHF shard (containing no locking internally).

Another strategy to achieve something similar but faster purely in caller code is having a separate process -- accessing sharedhashfile in parallel -- which reads through all keys in consecutive order in memory (much faster than regular key access, and works in parallel to regular processes accessing keys) and actively expires keys. I have used a technique like this before to actively expire keys and to persist a set of keys held purely in RAM, to SSD efficiently without compromising in memory key access performance of the other processes.

How would this caller implemented technique work with LRU? As the keys are consecutively accessed in memory, the continually (but likely throttled) scanning process adds the oldest e.g. n% of keys to a list of oldest keys. Note: Only the shorter key UID is added to the list, not the key itself. This list of oldest keys can be used later to purge older keys on demand. However, with this algorithm the absolutely least recently used key is not necessarily purged, only a similarly old least recently used key is purged. This may or may not fit in with your use case.

Hope this helps and looking forward to any questions!

@kolinfluence
Copy link

This could be achieved by the caller via embedding a double linked list in the value, so that all keys are always in chronological order. Then it'd be easy to find the least recently used key and delete it. But this algorithm would be bad for size and performance. All values would get larger. And there would be likely be a locking performance hit, although perhaps this could be eliminated by limiting the LRU double linked list to an individual SHF shard (containing no locking internally).

actually this is a "better" way to do so. not too sure how the SHF shard works though.

possible to introduce this with golang and i dont mind the lock though.

@simonhf
Copy link
Owner

simonhf commented Dec 10, 2022

not too sure how the SHF shard works though

FYI there is some explanation of the internals here [1] including the sharding.

[1] https://github.com/simonhf/sharedhashfile/blob/master/src/shf.h#L95

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants