Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent storage ? #19

Closed
zilveer opened this issue Apr 28, 2020 · 1 comment
Closed

Persistent storage ? #19

zilveer opened this issue Apr 28, 2020 · 1 comment

Comments

@zilveer
Copy link

zilveer commented Apr 28, 2020

Hi,
Is there any options for persistent storage ?

Regards

@simonhf
Copy link
Owner

simonhf commented May 1, 2020

Hi @zilveer, Thanks for the question!

There are a couple of options to do this based upon the performance requirements of your application.

Option 1: Highest performance for mostly read applications.

The easiest way would be not to use /dev/shm as the root for the memory mapped files used by SharedHashFile, but to point to an e.g. SSD path instead. However, this comes with some performance problems which may or may not be an issue for your application. The performance issues arise from how the kernel internally implements syncing changed memory map pages in RAM with the pages in the backing store for the memory mapped file. There is no way -- from the userland application -- to predict when or force the kernel to sync a changed memory map page to be synced back to disk. Although this syncing happens in the 'background' it uses resources and can heavily affect performance. But if your application is very read heavy at run-time then this might not even be an issue for you because too little syncing will happen -- due to infrequent database updates -- to cause a performance issue.

Option 2: Highest general performance for any read / write pattern.

Another mechanism is to use the general approach described here [1] which is to use /dev/shm as usual, but have a special process accessing SHF which iterates through the database saving as it goes; effectively making the database persistent in the background. In the background saving process, you can use SHF internals to loop through all the memory mapped files, which can be copied from RAM to RAM while write locked, and then the in-memory clone of the memory mapped file can be persisted to disk or elsewhere completely in the background without affecting database performance. Let's say you creat a clone of the /dev/shm SHF memory mapped files on SSD somewhere. If the box is rebooted, ideally upon start-up you could copy the clone back into /dev/shm, and restart SHF to get the database back.

Option 3: SharedHashFile agnostic solution.

In this option, whenever you update a key in the database, you write the update to a log file. If at a later date, you want to restore the database, then read the log file and 'replay' all the key updates. Using this mechanism you'll probably need some extra biz logic to consolidate log files over time, so upon start-up you grab a copy of the last known version of the database, and replay only a limited set of log files to bring it up to the present.

[1] #18 (comment)

@simonhf simonhf closed this as completed Dec 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants