Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce chunked backups #11900

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

Roghetti
Copy link

First time contributor checklist

Contributor checklist

  • Oneplus 3, Android 9
  • Virtual device, Android 9
  • Virtual device, Android 11
  • My contribution is fully baked and ready to be merged as is
  • I ensure that all the open issues my contribution fixes are mentioned in the commit message of my first commit using the Fixes #1234 syntax

Description

I work for bevuta IT GmbH, a company based in Germany. Our boss is a Signal user on Android. He has the problem that he can't backup his Signal app data to the SD card and despite 128 GB of internal memory, he no longer has free internal storage, mainly because of the signal backups. The most reasonable explanation was that his Android version doesn't support file systems other than FAT32 on SD cards. FAT32 has some limitations, such as a maximum file size of ~4 GB. His Signal installation uses more than 4 GB. To us, the use of a sd card for backups also seems advantageous, as it is easier to get at the data if the device breaks down.

We discussed several solutions. The best solution we came up with, is to just chunk the Signal backup into parts smaller than 4 GB, if desired. I was tasked to implement it. To use this code in a real Signal installation, it has to be in the official release. Now, here is my MR! :-)

@newhinton
Copy link
Contributor

@Roghetti This sounds amazing, one feature i am really looking forward to!

I do have a question, could you explain how it is beeing chunked?
I had the idea some time ago that it would be amazing to create "yearly" backups to achieve the same. Reason for that is it would allow for a partial restore of a time range (without loosing the rest, if the key does not change)

Also do the chunks change between runs? Eg. does only the last chunk change because it is the one containing the new data?

@Roghetti
Copy link
Author

Roghetti commented Jan 31, 2022

@Roghetti This sounds amazing, one feature i am really looking forward to!

Thanks! :-)

I do have a question, could you explain how it is beeing chunked?

It's just like a normal backup, but split into multiple files, if a single file would exceed a certain size. E.g., if your normal backup has 1.5 GB, the chunked backup would create two files, one with 1GB and one with 0.5GB as 1GB is the threshold I chose.
If you combine these files, e.g. with cat on the command line, you would get the same file as the 'normal' backup mechanism would give you.

I had the idea some time ago that it would be amazing to create "yearly" backups to achieve the same. Reason for that is it would allow for a partial restore of a time range (without loosing the rest, if the key does not change)
Also do the chunks change between runs? Eg. does only the last chunk change because it is the one containing the new data?

All chunks are affected when certain data has changed.
As this chunking is like the normal backup, but with multiple files, it depends in which order the data is exported. The current code exports each table of the sqlite-database, one by one, e.g., first MMS, then SMS, then Reactions and so on. If you get an MMS, all chunks (at least at the point with the new MMS) will change.
I like your idea of incremental backups, but it has to be addressed in a separate Pull Request and would affect the 'normal' and the chunked backup mechanism.

@newhinton
Copy link
Contributor

Ahhh thats a shame, i had hoped that this could reduce the data transferred each night and reduce wear on my flash-storage. Thanks anyway!

@cody-signal
Copy link
Contributor

Hey there, appreciate the PR. I'm not sure if we'll pull this in or not. I'll bring it up with the team.

@newhinton
Copy link
Contributor

@cody-signal There is also an discussion in the forum regarding this. I think we need to somehow adress this, because backups can easily grow beyond 4Gb, which is problematic for different reasons mentioned in the discussion

@Roghetti Roghetti force-pushed the backup-chunking branch 2 times, most recently from 6a92181 to bddab19 Compare March 2, 2022 09:20
@Roghetti
Copy link
Author

Roghetti commented Mar 2, 2022

The following feature request is related: Backup to FAT card

@HyperCriSiS
Copy link

Will the backups also be incremental?
The actual implementation which forces full backup every day is really dumb 🙁

@newhinton
Copy link
Contributor

@HyperCriSiS Sadly no, this will only 'split' the backup-file into parts. I hope that they will switch to incremental backups eventually. There is a linked discussion.

However, changing the backup-format should not be done very often, so we should do it properly. I think this pr should not be merged, since it fixes the symptom (backups beeing over 4Gb) and not the issue (missing segmentation/incremental backups) of ballooning backup-files

@Roghetti Roghetti force-pushed the backup-chunking branch 4 times, most recently from dac29a5 to 7a45c82 Compare April 12, 2022 15:21
@Smojo
Copy link

Smojo commented Apr 17, 2022

I think this pr should not be merged, since it fixes the symptom (backups beeing over 4Gb) and not the issue (missing segmentation/incremental backups) of ballooning backup-files.

I would not agree to that. Chunking would help people with devices with small internal storage but an SD-Card slot, if the Signal container size grows over 4GB and so most probably also the backup file size. Without chunking it needs 3 times >4GB (2 x existing backups + temporarily other copy until the new backup finished). The backup is encrypted so no problem to put it there. That is also the reason this issue is described initially.

Looking forward that it gets merged.

@newhinton
Copy link
Contributor

newhinton commented Apr 17, 2022

@Smojo I dont disagree with you that the growing filesize is a problem (it is, and will get even worse with time)

However, the underlying issue is not that some filesystem has limitations that chunking needs to work around, but that signal-backup files grow in size indefinitively.

Chunking DOES fix that, but only the symptom, not the problem.

I would like to see a solution where a single backup is split by year, and then only made 'on demand' (do we ALWAYS have to store chats again for 2016?)

This would solve the described issue AND the problem (as long as someone does not add more than 4Gb per year, but i deem that unlikely)

@Smojo
Copy link

Smojo commented Apr 17, 2022

Yeah, this might be something, which is needed and helps as well.

Right now (not sure if I'm wrong) there is only a possibility to shorten chats by message-count (which also somehow results in a smaller container and so also smaller backup-file[s]), but nobody can really tell what this means in regards of "message retention" of a single chat. Some chats will not even go back a year or so if they are very talkative.

So I would agree that a chunking by time, which results in a chunked container AND backup-files (which might also have a file-size limit of 4GB in addition to the split by time), will also solve the users problem described here. (without further thinking about symptom and problem ^^)

Generally I would ask:
Is there already

... a solution where a single backup is split by year ...

?
If not why not merge this PR and at least mitigate the problem for some users now, even if not the 100% solution and addressing the "root" problem?

@lieblingsnerd
Copy link

I would like to see a solution where a single backup is split by year, and then only made 'on demand' (do we ALWAYS have to store chats again for 2016?)

I think what the problem is is in the eye of the beholder. Incremental backups have the risk that an increment will break unnoticed. If, like me, you want to keep all your messages, the solution implemented here is generally reasonable and complete in itself. If you don't want to have old messages anymore, you could delete them in the application, then they wouldn't be part of a full backup anymore. Different people, different needs. Either way, this solution should be better than none. If you implement something better that would be all the cooler of course. But no reason to wait right?

I would be very happy if this PR would be merged. Then I wouldn't have to delete backups by hand every day 😃

@newhinton
Copy link
Contributor

newhinton commented May 11, 2022

@lieblingsnerd Sure, different users have different needs. But there is an extremely good reason to not merge this/wait for now.

As far as i can see it, signal dev's are quite conservative in terms of new features, and this will certainly be a breaking change (for backups). If the backup-system is getting changes, they will not do so lightly, and not often. That means waiting and taking time to really think through the issue is a very good reason to wait.

Also: i do not consider 'delete some old messages' to be a valid solution. Backups are there because i DO want to keep old messages, deleting them cannot be the answer.

And afaik you still need to delete them, this pr only splits the backup into multiple files that are smaller than the filesystem boundaries.

@HyperCriSiS
Copy link

I see it not as breaking change. The current status of the backup system is really bad because it is incomplete.
Since some days Signal even looses the permission to the folder. I have to re-enable it every few days.

@newhinton
Copy link
Contributor

@HyperCriSiS Any major change to the way backups are created are possibly a breaking change. This pr splits up the singular backup into multiple files. That requires long term support, and if not done correctly, may render old backups unusable. Therefore it could be breaking.

However, your problem does not seem related at all, have you checked existing issues and opened a proper ticket for your problem if no ticket exists?

@Smojo
Copy link

Smojo commented May 11, 2022

@lieblingsnerd

Different people, different needs.

Agree!

Then I wouldn't have to delete backups by hand every day smiley

Agree ... doing the same as I'm out of storage + I cannot use the SD-card because of 4GB limit

@HyperCriSiS

Signal even looses the permission to the folder. I have to re-enable it every few days.

Also had this issue, but this PR will probably not solve that one ^^

I see it not as breaking change.

Agree. Why is a file chunking a breaking change? (just writing that realizing that I have no clue about coding ... still it will not break the current backup logic / way of working ... all the UI stuff can stay imho).

Still @newhinton might be correct in regards of backward compatibility etc.
If so I'm sure @Roghetti can comment on that.

I would be very happy if this PR would be merged.

+1
Again: Why not merge this PR and at least mitigate the problem for some users now, even if not the 100% solution and addressing the "root" problem?

@Grunthos
Copy link

Grunthos commented Mar 7, 2023

If there is any plan in this process to allow even basic incremental backups (eg. chunking + rotating selection from tables based on time periods), then I would be happy to help on any aspect of the Android/common coding or testing side of things.

I have not looked at the data side of things, but a naive approach would seem to me to:

  • find all records <= (year1 = min(year))
  • backup via chunking
  • Finish the backup and start a 'new' backup to new base name
  • find all records from (year-1 to (year2=(year1-1))
  • backup via chunking
  • ditto all the way to yearN and year(N+1)
  • subsequent backups read old chunks and write new temporary chunks, discarding the temp chunks if the old chunk matches. Discard the old chunk if a discrepancy is found.

This should result in a single current and consistent backup, and only result in 'new' files for the recent year (unless the record formats have changed or data has been added for prior years). One could also chunk by month or some other user-defined interval.

@stale
Copy link

stale bot commented May 6, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label May 6, 2023
@Grunthos
Copy link

Grunthos commented May 6, 2023

Chunked backups of some kind are an important feature.

@stale stale bot removed the wontfix label May 6, 2023
@stale
Copy link

stale bot commented Jul 9, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jul 9, 2023
@Grunthos
Copy link

Grunthos commented Jul 9, 2023

It would be great to have SOME solution along these lines!

@stale stale bot removed the wontfix label Jul 9, 2023
@janvlug
Copy link

janvlug commented Jul 9, 2023

Once I lost all history because backup failed of the file being too large for the file system. This is very important and highly needed functionality.

@stale
Copy link

stale bot commented Oct 9, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Oct 9, 2023
@janvlug
Copy link

janvlug commented Oct 10, 2023

There is still an urgent need for this functionality.

@stale stale bot removed the wontfix label Oct 10, 2023
Copy link

stale bot commented Dec 9, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Dec 9, 2023
@Grunthos
Copy link

Grunthos commented Dec 9, 2023

This issue is not stale! It's just receiving the attention it deserves. The problem it addresses is not going away and is getting worse. It has resulted in me largely giving up on using Signal except in rare circumstances.

@Grunthos
Copy link

Grunthos commented Dec 9, 2023

wontfix tag was added by the bot and should probably be removed since that represents policy by stagnation vs deliberative decision.

Copy link

stale bot commented Feb 10, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Feb 10, 2024
@Grunthos
Copy link

This still remains a crucial feature. The lack of it means that I now use signal for far fewer people and purposes.

@stale stale bot removed the wontfix label Feb 10, 2024
Copy link

stale bot commented Apr 11, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Apr 11, 2024
@Xyaren
Copy link

Xyaren commented Apr 11, 2024

bump

@stale stale bot removed the wontfix label Apr 11, 2024
@Grunthos
Copy link

Grunthos commented Apr 11, 2024

And I second that bump. The current situation is untenable.

Sadly my vote should perhaps be discounted because this deficiency has meant I have largely given up on signal.

@Smojo
Copy link

Smojo commented Apr 11, 2024

@greyson-signal over a year ago you commented ( #11900 (comment) ):

We won't be merging this as-is, but we do have some plans for trying to do some of this work in the coming weeks. I'll leave this open until we get around to it. Thanks!

What happened since then?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet