-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicacy #14
Comments
I've updated the comparaison table with your remarks.
Does duplicacy have a preferred self hosted backend ?
I'm a bit puzzled. Since there are data chunks, there need to be somewhere a description of where they are linked to... something like an index...? For now, I've added the If I go the chunk size route, I'll have to do this for all backup solutions. |
Hi , Indeed, the lack of index or db is one of the most amazing design features of Duplicacy "What is novel about lock-free deduplication is the absence of a centralized indexing database for tracking all existing chunks and for determining which chunks are not needed any more. Instead, to check if a chunk has already been uploaded before, one can just perform a file lookup via the file storage API using the file name derived from the hash of the chunk. This effectively turns a cloud storage offering only a very limited set of basic file operations into a powerful modern backup backend capable of both block-level and file-level deduplication. More importantly, the absence of a centralized indexing database means that there is no need to implement a distributed locking mechanism on top of the file storage." |
Hi @deajan,
Awesome work!
I took a look at your script and I have a few suggestions:
1-For backup and restore commands, please use the -threads option with 8 threads for your setup. It will significantly increase speed.
Increase -threads from 8 until you saturate the network link or see a decrease in speed.
2-During init please play with chunk size:
-chunk-size, -c the average size of chunks (default is 4M)
-max-chunk-size, -max the maximum size of chunks (default is chunk-size*4)
-min-chunk-size, -min the minimum size of chunks (default is chunk-size/4)
With homogeneous data, you should see smaller backups and better deduplication. see Chunk size details
3-Some clarifications for your shopping list on Duplicacy:
1-Redundant index copies : duplicacy doesn't use indexes. (or db)
2-Continue restore on bad blocks in repository: yes, and Erasure Coding
3-Data checksumming: yes
4-Backup mounting as filesystem: No (fuse implementation PR but not likely short term)
5-File includes / excludes bases on regexes: yes
6-Automatically excludes CACHEDIR.TAG(3) directories: No
7-Are metadatas encrypted too ?: yes
8-Can encrypted / compressed data be guessed (CRIME/BREACH style attacks)?: No
9-Can a compromised client delete backups?: No (with pub key and immutable target->requires target setup)
10-Can a compromised client restore encrypted data? No (with pub key)
11-Does the backup software support pre/post execution hooks?: yes, see Pre Command and Post Command Scripts
12-Does the backup software provide a crypto benchmark ?: there is a Benchmark command.
Important:
13- Duplicacy is serverless: Less cost, less maintenance, less attack surface..
This also means that D will always be a bit slower since it has to list before it uploads a particular chunk.
14: Duplicacy works with a ton of storage backends: Infinitely scalable and more secure.
15-No indexes or databases.
16-You should test partial restore
17-Test data should be a little bit more diverse. But I guess this is difficult
Hope this helps a bit. Feel free to join the Forum.
Keep up the good work.
The text was updated successfully, but these errors were encountered: