Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scalability #2

Open
cinghiopinghio opened this issue Dec 18, 2018 · 6 comments
Open

Scalability #2

cinghiopinghio opened this issue Dec 18, 2018 · 6 comments

Comments

@cinghiopinghio
Copy link

This is a great idea, thanks for sharing.
I'm mostly worried about the size of new-entries which, according to my understanding, is always growing, containing the whole history of the database.
Is there a plan to introduce some mitigation of this?

@39aldo39
Copy link
Owner

Indeed, new-entries contains the whole history of the database. However, since the number of read bytes are stored, larger files shouldn't slow it down too much. And by using .decsync-sequence files, unchanged directories are skipped. But it does always grow.

I don't have any concrete plan to mitigate this. But a way to do it is to delete old entries. This can be entries which are read by everyone, overwritten by a new value, or have aged (for example, very old RSS articles statuses are irrelevant). We then have to store the datetime instead of the number of bytes, but that makes skipping files less efficient. Maybe there is a way to still use bytes?

@Backfighter
Copy link

Backfighter commented Nov 19, 2019

The size impact could be reduced by only storing diffs inside new-entries. This means updating entries in stored-entries is simply applying a diff to the existing entry.

Since all values of the store are most likely text based, diffs should work well.

@Backfighter
Copy link

Backfighter commented Nov 19, 2019

In terms of deleting entries you would only delete entries at the top of the file and store how many bytes have been deleted. When other apps read the files they simply substract this number from the stored read-bytes and start at this new offset.

Obviously only entries can be deleted that have already been read by all other apps otherwise information is lost.

The only caveat is that this is incompatible with diffs.

@39aldo39
Copy link
Owner

I have thought a bit more about reducing the size of the files. Note that I have introduced versioning for DecSync, so incompatible changes are now possible. A few ideas:

  • read-bytes can be removed. Skipping some content of a file is not that much faster, and being able to remove values is definitely worth it. Its main advantage was skipping whole files, but using .decsync-sequence files (or similar) also achieves this.
  • new-entries can be removed as well. We only need to store stored-entries and just read the whole file when it changes. Multiple occurrences should only be executed once.
  • To reduce the size of the values (e.g. calendar events), we might even use references inside stored-entries that point to the original app that introduced the value. This does not introduce problems when that original value is removed, as that means it is either updated or has aged.

@39aldo39
Copy link
Owner

39aldo39 commented Aug 4, 2020

I have now actually released a draft of version 2!

The main motivation is that Android is moving to SAF, which is very, very slow. The main slowdown is in having a lot of different files inside a lot of directories. This is mostly solved by making the structure a lot flatter. In addition, it is simplified using the first 2 points given above, the entries are duplicated less and it is easier to remove old entries.

@Backfighter
Copy link

The 1 byte hash systems seems a little to limiting in my opinion. With only 256 possible path hashes wouldn't there be a high probability of collisions? Let's say you have only 20 paths, then the probability of at least one collision should already be at arround 50%. Have I overlooked something here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants