Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore failure - Unexpected EOF - retrieved data backup size exceed expected size #7198

Open
m4teh opened this issue Jan 12, 2022 · 11 comments · May be fixed by QubesOS/qubes-core-admin#461
Open
Labels
affects-4.1 This issue affects Qubes OS 4.1. C: core diagnosed Technical diagnosis has been performed (see issue comments). P: major Priority: major. Between "default" and "critical" in severity. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.

Comments

@m4teh
Copy link

m4teh commented Jan 12, 2022

Qubes OS release

4.1.0-rc3

Brief summary

Sporadic but majority failed restores (or corrupted backups) on 4.1x installations. Any size backup, small or large. Using Btrfs. Restoring on same version of Qubes, same and different systems.
Will restore AppVM but contain no data. Will display an error but conclude with "Finished successfully!", when it has not.

Only one reference to the same issue from 2017 found here https://groups.google.com/g/qubes-users/c/0i7RfWhoCxU

Backups and data do appear to restore successfully if running qvm-backup-restore --ignore-size-limit via dom0.

Forum issue discussion here - https://forum.qubes-os.org/t/backups-are-corrupted-on-restore/8446

Steps to reproduce

Create a backup and restore it.

Expected behavior

Successful restore containing all data with no errors.

Actual behavior

Restores empty AppVM containing no data, and errors contained in the output.

Unable to extract files, unexpected EOF in archive tar, retrieved data backup size exceed expected size

7d496c01e2ff654edaf628a888af6336f19ed590

@m4teh m4teh added P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists. labels Jan 12, 2022
@andrewdavidwong andrewdavidwong added C: core needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. P: major Priority: major. Between "default" and "critical" in severity. and removed P: default Priority: default. Default priority for new issues, to be replaced given sufficient information. labels Jan 13, 2022
@andrewdavidwong andrewdavidwong added this to the Release 4.1 milestone Jan 13, 2022
@rustybird
Copy link

Was the backup also created from a Btrfs system?

And do you remember if you resized one of the affected VMs while the backup was in progress?

@m4teh
Copy link
Author

m4teh commented Feb 3, 2022 via email

@tquest1
Copy link

tquest1 commented Mar 8, 2022

I have exactly the same problem with Qubes 4.1 fully updated today. Also Btrfs install. I have tried various combinations of backups: - AppVMs, templates - single, multiple. Always the same error as m4teh in the original post, on full restore or verify only:
Error: unable to extract files ... Unexpected EOF in archive tar ...

I have tried on multiple Qubes 4.1 Btrfs installs, different laptops all with same error.
Has there been any progress with this issue? - it's quite a major bug, living with no easily restorable backups is asking for trouble.

@rustybird
Copy link

rustybird commented Mar 8, 2022

@tquest1 I assume that (same as for @m4teh) the backup was created from a Btrfs system, and qvm-backup-restore --ignore-size-limit works around the issue?

If someone can post a reproducible way to generate an affected backup file, that would be extremely helpful. (Beware that any backup file also includes the full qubes.xml with metadata about all VMs - not only the backed up ones! - so it's not really safe to share the backup file itself. Unless maybe if it's from a throwaway Qubes system installed just for this purpose.)

@tquest1
Copy link

tquest1 commented Mar 9, 2022

I don't really have a 'throwaway' install at the moment - but it does seem that it ALWAYS results in the same error on any Btrfs install - I have tried it on 2 colleagues' Qubes machines - with even a tiny VM of a few MB & it is always the same result from the gui.

@rustybird
Copy link

rustybird commented Mar 9, 2022

Okay I think I found the problem:

When a VM is to be backed up, its disk usage value is queried and recorded in the backup metadata. This value is then used as an extraction limit during restore. But it's easy for the backup system to record a wrong value:

For one thing, the query happens at the beginning of the backup run, but it might take hours until it's that particular VM's turn to be backed up. If more data is saved by the VM in the meantime, it can trigger this bug.

The reason a mismatch would happen more often when backing up from e.g. Btrfs is that the 'file-reflink' driver (as well as the legacy 'file' driver) returns the live disk usage, whereas the 'lvm_thin' driver returns the committed disk usage.

Maybe we could fix this by having storage drivers implement an export_usage() volume method that - like export_end() - would operate on the reference returned by export(), and using an open file descriptor as the reference.

There's also a similar problem for firewall.xml and for any additional files from the backup-get-files event, ugh.

@rustybird
Copy link

Should the restore code maybe just disable the byte size limit (not necessarily the file count limit) unconditionally, at least for hopefully well-authenticated format v4+ backups?

This looks like it has always been racy, and it might take a while to fully fix, and even then some people will still have their existing backups with wrong size information.

@marmarek
Copy link
Member

marmarek commented Mar 9, 2022 via email

@andrewdavidwong andrewdavidwong added diagnosed Technical diagnosis has been performed (see issue comments). and removed needs diagnosis Requires technical diagnosis from developer. Replace with "diagnosed" or remove if otherwise closed. labels Mar 9, 2022
@marmarek
Copy link
Member

Maybe we could fix this by having storage drivers implement an export_usage() volume method that - like export_end() - would operate on the reference returned by export()

One issue with this idea, is that export() is called much later than the size is needed. That's because (among other reasons), user gets backup summary to accept - that includes those sizes - and only then the backup process can start. I don't want any changes to the system state before doing backup is actually started. Calling export() when preparing this summary would violate this.

But as said above, we can enforce the size limit in some different way (per-file, not per-volume or per-whole-backup). Those reported sizes are approximation anyway, because backup is then compressed.

DemiMarie added a commit to DemiMarie/qubes-core-admin that referenced this issue Apr 1, 2022
Volumes returned by `export()` must be immutable, since otherwise the
backup will be inconsistent.  Ensure this by exporting a snapshot of the
volume, no the volume itself.

Fixes QubesOS/qubes-issues#7198.

FIXME: tests
@scallyob
Copy link

scallyob commented Apr 2, 2022

Just wanted to say I've been experiencing this on a ext4 install as well. It really became a problem when I stopped compressing my backups. Compressing seems to greatly reduce if not eliminate this problem.

@rustybird
Copy link

It really became a problem when I stopped compressing my backups. Compressing seems to greatly reduce if not eliminate this problem.

Yep, makes sense.

Sorry I haven't gotten around to implementing Marek's suggested per-file size limit yet. But qvm-backup-restore --ignore-size-limit should be fine as a workaround for the moment. (Restore already invokes qfile-dom0-unpacker with an argument that makes it wait if there's less than 500 MiB of free space, instead of completely filling up the disk.)

DemiMarie added a commit to DemiMarie/qubes-core-admin that referenced this issue May 3, 2022
Volumes returned by `export()` must be immutable, since otherwise the
backup will be inconsistent.  Ensure this by exporting a snapshot of the
volume, no the volume itself.

Fixes QubesOS/qubes-issues#7198.
DemiMarie added a commit to DemiMarie/qubes-core-admin that referenced this issue May 5, 2022
Volumes returned by `export()` must be immutable, since otherwise the
backup will be inconsistent.  Ensure this by exporting a snapshot of the
volume, no the volume itself.

Fixes QubesOS/qubes-issues#7198.
DemiMarie added a commit to DemiMarie/qubes-core-admin that referenced this issue May 5, 2022
Volumes returned by `export()` must be immutable, since otherwise the
backup will be inconsistent.  Ensure this by exporting a snapshot of the
volume, no the volume itself.

Fixes QubesOS/qubes-issues#7198.
DemiMarie added a commit to DemiMarie/qubes-core-admin that referenced this issue May 24, 2022
Volumes returned by `export()` must be immutable, since otherwise the
backup will be inconsistent.  Ensure this by exporting a snapshot of the
volume, no the volume itself.

Fixes QubesOS/qubes-issues#7198.
DemiMarie added a commit to DemiMarie/qubes-core-admin that referenced this issue May 24, 2022
Volumes returned by `export()` must be immutable, since otherwise the
backup will be inconsistent.  Ensure this by exporting a snapshot of the
volume, no the volume itself.

Fixes QubesOS/qubes-issues#7198.
@andrewdavidwong andrewdavidwong added the affects-4.1 This issue affects Qubes OS 4.1. label Aug 8, 2023
@andrewdavidwong andrewdavidwong removed this from the Release 4.1 updates milestone Aug 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-4.1 This issue affects Qubes OS 4.1. C: core diagnosed Technical diagnosis has been performed (see issue comments). P: major Priority: major. Between "default" and "critical" in severity. T: bug Type: bug report. A problem or defect resulting in unintended behavior in something that exists.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants