-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scalability #2
Comments
Indeed, I don't have any concrete plan to mitigate this. But a way to do it is to delete old entries. This can be entries which are read by everyone, overwritten by a new value, or have aged (for example, very old RSS articles statuses are irrelevant). We then have to store the datetime instead of the number of bytes, but that makes skipping files less efficient. Maybe there is a way to still use bytes? |
The size impact could be reduced by only storing diffs inside new-entries. This means updating entries in stored-entries is simply applying a diff to the existing entry. Since all values of the store are most likely text based, diffs should work well. |
In terms of deleting entries you would only delete entries at the top of the file and store how many bytes have been deleted. When other apps read the files they simply substract this number from the stored read-bytes and start at this new offset. Obviously only entries can be deleted that have already been read by all other apps otherwise information is lost. The only caveat is that this is incompatible with diffs. |
I have thought a bit more about reducing the size of the files. Note that I have introduced versioning for DecSync, so incompatible changes are now possible. A few ideas:
|
I have now actually released a draft of version 2! The main motivation is that Android is moving to SAF, which is very, very slow. The main slowdown is in having a lot of different files inside a lot of directories. This is mostly solved by making the structure a lot flatter. In addition, it is simplified using the first 2 points given above, the entries are duplicated less and it is easier to remove old entries. |
The 1 byte hash systems seems a little to limiting in my opinion. With only 256 possible path hashes wouldn't there be a high probability of collisions? Let's say you have only 20 paths, then the probability of at least one collision should already be at arround 50%. Have I overlooked something here? |
This is a great idea, thanks for sharing.
I'm mostly worried about the size of
new-entries
which, according to my understanding, is always growing, containing the whole history of the database.Is there a plan to introduce some mitigation of this?
The text was updated successfully, but these errors were encountered: