Duplicacy #14

blackbit47 · 2022-09-07T01:48:22Z

Hi @deajan,
Awesome work!

I took a look at your script and I have a few suggestions:

1-For backup and restore commands, please use the -threads option with 8 threads for your setup. It will significantly increase speed.

Increase -threads from 8 until you saturate the network link or see a decrease in speed.

2-During init please play with chunk size:

-chunk-size, -c the average size of chunks (default is 4M)
-max-chunk-size, -max the maximum size of chunks (default is chunk-size*4)
-min-chunk-size, -min the minimum size of chunks (default is chunk-size/4)

With homogeneous data, you should see smaller backups and better deduplication. see Chunk size details

3-Some clarifications for your shopping list on Duplicacy:

1-Redundant index copies : duplicacy doesn't use indexes. (or db)
2-Continue restore on bad blocks in repository: yes, and Erasure Coding
3-Data checksumming: yes
4-Backup mounting as filesystem: No (fuse implementation PR but not likely short term)
5-File includes / excludes bases on regexes: yes
6-Automatically excludes CACHEDIR.TAG(3) directories: No
7-Are metadatas encrypted too ?: yes
8-Can encrypted / compressed data be guessed (CRIME/BREACH style attacks)?: No
9-Can a compromised client delete backups?: No (with pub key and immutable target->requires target setup)
10-Can a compromised client restore encrypted data? No (with pub key)
11-Does the backup software support pre/post execution hooks?: yes, see Pre Command and Post Command Scripts
12-Does the backup software provide a crypto benchmark ?: there is a Benchmark command.

Important:

13- Duplicacy is serverless: Less cost, less maintenance, less attack surface..
This also means that D will always be a bit slower since it has to list before it uploads a particular chunk.
14: Duplicacy works with a ton of storage backends: Infinitely scalable and more secure.
15-No indexes or databases.

16-You should test partial restore
17-Test data should be a little bit more diverse. But I guess this is difficult
Hope this helps a bit. Feel free to join the Forum.

Keep up the good work.

deajan · 2022-09-07T18:40:13Z

I've updated the comparaison table with your remarks.

13- Duplicacy is serverless: Less cost, less maintenance, less attack surface..
14: Duplicacy works with a ton of storage backends: Infinitely scalable and more secure.

Does duplicacy have a preferred self hosted backend ?

15-No indexes or databases.

I'm a bit puzzled. Since there are data chunks, there need to be somewhere a description of where they are linked to... something like an index...?

For now, I've added the -threads option for the next test round.

If I go the chunk size route, I'll have to do this for all backup solutions.

blackbit47 · 2022-09-07T19:29:06Z

Hi ,

Indeed, the lack of index or db is one of the most amazing design features of Duplicacy
Let me quote from the Lock free deduplication algorithm

"What is novel about lock-free deduplication is the absence of a centralized indexing database for tracking all existing chunks and for determining which chunks are not needed any more. Instead, to check if a chunk has already been uploaded before, one can just perform a file lookup via the file storage API using the file name derived from the hash of the chunk. This effectively turns a cloud storage offering only a very limited set of basic file operations into a powerful modern backup backend capable of both block-level and file-level deduplication. More importantly, the absence of a centralized indexing database means that there is no need to implement a distributed locking mechanism on top of the file storage."

deajan added a commit that referenced this issue Sep 7, 2022

Update duplicacy as per #14

483743a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicacy #14

Duplicacy #14

blackbit47 commented Sep 7, 2022 •

edited

deajan commented Sep 7, 2022

blackbit47 commented Sep 7, 2022

Duplicacy #14

Duplicacy #14

Comments

blackbit47 commented Sep 7, 2022 • edited

deajan commented Sep 7, 2022

blackbit47 commented Sep 7, 2022

blackbit47 commented Sep 7, 2022 •

edited