Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restored data does not always match original (Ceph bug) #136

Closed
bbs2web opened this issue Jan 13, 2022 · 21 comments
Closed

Restored data does not always match original (Ceph bug) #136

bbs2web opened this issue Jan 13, 2022 · 21 comments
Assignees

Comments

@bbs2web
Copy link

bbs2web commented Jan 13, 2022

I have a particular Ceph RBD image that is corrupt when I restore it. I additionally yesterday removed the snapshot which resulted in the subsequent backup yesterday evening being a full backup. A deep scrub completes without errors, as does a deep scrub where I provide the snapshot Benji created whilst initiating the backup yesterday. Comparing sha sums of 4 MiB blocks of the snapshot and restored images however results in differences.

I presume a possible hash collision, or perhaps something going out of alignment during the restore process?

The image isn't massive, it's allocated 99 out of 200 GiB. The following is a much smaller test (124 MiB) where everything works as expected so I presume that this isn't a configuration/S3/Ceph problem:

I'm running Ceph Pacific 16.2.7 on Proxmox (Debian 11.2) with Benji 0.15.0:

    INFO: Benji version: 0.15.0.
    INFO: Configuration version: 1.0.0, supported >=1,<2.
    INFO: Metadata version: 3.0.0, supported >=1,<4.
    INFO: Object metadata version: 2.0.0, supported >=1,<3.
(benji) [admin@kvm5b ~]# rbd create rbd_ssd/vm-448-disk-0_TEMP2 --size 200M
(benji) [admin@kvm5b ~]# rbd map rbd_ssd/vm-448-disk-0_TEMP2
/dev/rbd36
(benji) [admin@kvm5b ~]# mkfs.ext3 /dev/rbd36
(benji) [admin@kvm5b ~]# mount /dev/rbd36 /media
(benji) [admin@kvm5b ~]# cp /boot/vmlinuz-5.13.19-* /media
(benji) [admin@kvm5b ~]# sync
(benji) [admin@kvm5b ~]# umount /media
(benji) [admin@kvm5b ~]# rbd unmap /dev/rbd36
(benji) [admin@kvm5b ~]# rbd du rbd_ssd/vm-448-disk-0_TEMP2
NAME                 PROVISIONED  USED
vm-448-disk-0_TEMP2      200 MiB  124 MiB

# Backup the image:
(benji) [admin@kvm5b ~]# benji::backup::ceph temp-disk0 rbd_ssd vm-448-disk-0_TEMP2 Troubleshooting;
No previous RBD snapshot found, reverting to initial backup.
Performing initial backup of temp-disk0:rbd_ssd/vm-448-disk-0_TEMP2.
Creating snap: 100% complete...done.
    INFO: Starting backup of rbd:rbd_ssd/vm-448-disk-0_TEMP2@b-2022-01-13T14:50:56 to storage radosgw-kvm7, the newly created version is temp-disk0-sngjp6.
    INFO: Active transforms for storage radosgw-kvm7: zstd (zstd), aes_256_gcm (aes_256_gcm).
    INFO: Backed up 1/31 blocks (3.2%)
    INFO: Backed up 2/31 blocks (6.5%)
<snip>
    INFO: Backed up 30/31 blocks (96.8%)
    INFO: Backed up 31/31 blocks (100.0%)
    INFO: Set status of version temp-disk0-sngjp6 to valid.
    INFO: Backed up metadata of version temp-disk0-sngjp6.
    INFO: Backup successful.
    INFO: Added label(s) to version temp-disk0-sngjp6: Troubleshooting=.

# Backup is reported as being valid:
(benji) [admin@kvm5b ~]# benji ls 'volume == "temp-disk0"';
+---------------------+-------------------+------------+-----------------------+----------+------------+--------+-----------+--------------+
|         date        | uid               | volume     | snapshot              |     size | block_size | status | protected | storage      |
+---------------------+-------------------+------------+-----------------------+----------+------------+--------+-----------+--------------+
| 2022-01-13T14:50:58 | temp-disk0-sngjp6 | temp-disk0 | b-2022-01-13T14:50:56 | 200.0MiB |     4.0MiB | valid  |   False   | radosgw-kvm7 |
+---------------------+-------------------+------------+-----------------------+----------+------------+--------+-----------+--------------+


# Deep scrub is successfull:
(benji) [admin@kvm5b ~]# benji deep-scrub temp-disk0-sngjp6
    INFO: Active transforms for storage radosgw-kvm7: zstd (zstd), aes_256_gcm (aes_256_gcm).
    INFO: Deep-scrubbed 1/31 blocks (3.2%)
    INFO: Deep-scrubbed 2/31 blocks (6.5%)
<snip>
    INFO: Deep-scrubbed 30/31 blocks (96.8%)
    INFO: Deep-scrubbed 31/31 blocks (100.0%)
    INFO: Set status of version temp-disk0-sngjp6 to valid.
    INFO: Deep-scrub of version temp-disk0-sngjp6 successful.

# Deep scrub with source image comparison:
(benji) [admin@kvm5b ~]# benji deep-scrub -s rbd:rbd_ssd/vm-448-disk-0_TEMP2@b-2022-01-13T14:50:56 temp-disk0-sngjp6
    INFO: Active transforms for storage radosgw-kvm7: zstd (zstd), aes_256_gcm (aes_256_gcm).
    INFO: Deep-scrubbed 1/31 blocks (3.2%)
    INFO: Deep-scrubbed 2/31 blocks (6.5%)
<snip>
    INFO: Deep-scrubbed 30/31 blocks (96.8%)
    INFO: Deep-scrubbed 31/31 blocks (100.0%)
    INFO: Set status of version temp-disk0-sngjp6 to valid.
    INFO: Deep-scrub of version temp-disk0-sngjp6 successful.

# Restore is successfull:
(benji) [admin@kvm5b ~]# benji restore --sparse temp-disk0-sngjp6 rbd:rbd_ssd/vm-448-disk-0_RESTORED;
    INFO: Active transforms for storage radosgw-kvm7: zstd (zstd), aes_256_gcm (aes_256_gcm).
    INFO: Restored 1/31 blocks (3.2%)
    INFO: Restored 2/31 blocks (6.5%)
<snip>
    INFO: Restored 30/31 blocks (96.8%)
    INFO: Restored 31/31 blocks (100.0%)
    INFO: Successfully restored version temp-disk0-sngjp6 in 03s with 36.5MiB/s.


# Mount the images and compare the SHA sumbs of all 4 MiB blocks:
(benji) [root@kvm5b ~]# rbd map rbd_ssd/vm-448-disk-0_TEMP2@b-2022-01-13T14:50:56
/dev/rbd36
(benji) [root@kvm5b ~]# rbd map rbd_ssd/vm-448-disk-0_RESTORED
/dev/rbd37

perl -ne 'use Digest::SHA qw(sha1_base64);BEGIN{$/=\4194304};print sha1_base64($_)."\n"' /dev/rbd36 > 1
perl -ne 'use Digest::SHA qw(sha1_base64);BEGIN{$/=\4194304};print sha1_base64($_)."\n"' /dev/rbd37 > 2
diff -uNr 1 2;
<no output>

The above steps appear to demonstrate that Benji does generally work, we have observed no other instances of data ever corrupting during a restore.

Just to clarify, a benji deep scrub on the problematic backup image yields no errors. Doing a deep scrub where I compare the data to the snapshot also completes without errors. Restoring that image however yields differences in the SHA sums when I compute sha1_base64 for 4 MiB blocks of the original and restored RBD image.

@bbs2web bbs2web added the bug Something isn't working label Jan 13, 2022
@bbs2web
Copy link
Author

bbs2web commented Jan 13, 2022

The problematic image I have is 200GiB, trying to shrink this to a size that may allow uploading and reproducing the problem...

@bbs2web
Copy link
Author

bbs2web commented Jan 13, 2022

The following are the same steps which yield no errors either but result in a corrupted image after it's restored.

Benji configuration file:

configurationVersion: '1'
logFile: /var/log/benji.log
databaseEngine: postgresql://benji-kvm5:********************************@postgresql.redated.com:5432/benji-kvm5
defaultStorage: radosgw-kvm7
storages:
  - name: radosgw-kvm7
    module: s3
    storageId: 1
    configuration:
      simultaneousReads: 12
      simultaneousWrites: 12
      activeTransforms:
        - zstd
        - aes_256_gcm
      awsAccessKeyId: '********************'
      awsSecretAccessKey: '***************************************'
      bucketName: benji-kvm5
      endpointUrl: https://kvm7b.redated.com:7480/
ios:
  - name: file
    module: file
  - name: rbd
    module: rbdaio
    configuration:
      simultaneousReads: 12
      simultaneousWrites: 12
      cephConfigFile: /etc/ceph/ceph.conf
      clientIdentifier: admin
      newImageFeatures:
        - RBD_FEATURE_LAYERING
        - RBD_FEATURE_EXCLUSIVE_LOCK
        - RBD_FEATURE_STRIPINGV2
        - RBD_FEATURE_OBJECT_MAP
        - RBD_FEATURE_FAST_DIFF
#        - RBD_FEATURE_DEEP_FLATTEN
transforms:
  - name: zstd
    module: zstd
    configuration:
      level: 4
  - name: aes_256_gcm
    module: aes_256_gcm
    configuration:
      kdfSalt: '********************************************'
      kdfIterations: 1000
      password: '********************************'

Problematic image where I can consistently re-produce corruption when restoring images:

# Image type and allocated space:
(benji) [admin@kvm5b ~]# rbd info rbd_ssd/vm-448-disk-0_TEMP;
rbd image 'vm-448-disk-0_TEMP':
        size 200 GiB in 51200 objects
        order 22 (4 MiB objects)
        snapshot_count: 1
        id: 3319aff9d321bb
        block_name_prefix: rbd_data.3319aff9d321bb
        format: 2
        features: layering, exclusive-lock, object-map, fast-diff
        op_features:
        flags:
        create_timestamp: Thu Jan 13 14:45:45 2022
        access_timestamp: Thu Jan 13 20:36:59 2022
        modify_timestamp: Thu Jan 13 14:47:47 2022
[admin@kvm5b ~]# rbd du rbd_ssd/vm-448-disk-0_TEMP;
NAME                PROVISIONED  USED
vm-448-disk-0_TEMP      200 GiB  99 GiB

# Backup the problematic image:
(benji) [admin@kvm5b ~]# benji::backup::ceph temp-disk0 rbd_ssd vm-448-disk-0_TEMP Troubleshooting;
No previous RBD snapshot found, reverting to initial backup.
Performing initial backup of temp-disk0:rbd_ssd/vm-448-disk-0_TEMP.
Creating snap: 100% complete...done.
    INFO: Starting backup of rbd:rbd_ssd/vm-448-disk-0_TEMP@b-2022-01-13T20:29:49 to storage radosgw-kvm7, the newly created version is temp-disk0-ofno6z.
    INFO: Active transforms for storage radosgw-kvm7: zstd (zstd), aes_256_gcm (aes_256_gcm).
    INFO: Backed up 127/25250 blocks (0.5%)
    INFO: Backed up 254/25250 blocks (1.0%)
<snip>
    INFO: Backed up 25146/25250 blocks (99.6%)
    INFO: Backed up 25250/25250 blocks (100.0%)
    INFO: Set status of version temp-disk0-ofno6z to valid.
    INFO: Backed up metadata of version temp-disk0-ofno6z.
    INFO: Backup successful.
    INFO: Added label(s) to version temp-disk0-ofno6z: Troubleshooting=.

# Backup is reported as being valid:
(benji) [admin@kvm5b ~]# benji ls 'volume == "temp-disk0"';
+---------------------+-------------------+------------+-----------------------+----------+------------+--------+-----------+--------------+
|         date        | uid               | volume     | snapshot              |     size | block_size | status | protected | storage      |
+---------------------+-------------------+------------+-----------------------+----------+------------+--------+-----------+--------------+
| 2022-01-13T20:29:51 | temp-disk0-ofno6z | temp-disk0 | b-2022-01-13T20:29:49 | 200.0GiB |     4.0MiB | valid  |   False   | radosgw-kvm7 |
+---------------------+-------------------+------------+-----------------------+----------+------------+--------+-----------+--------------+

# Deep scrub is successful:
(benji) [admin@kvm5b ~]# benji deep-scrub temp-disk0-ofno6z;
    INFO: Active transforms for storage radosgw-kvm7: zstd (zstd), aes_256_gcm (aes_256_gcm).
    INFO: Deep-scrubbed 127/25250 blocks (0.5%)
    INFO: Deep-scrubbed 254/25250 blocks (1.0%)
<snip>
    INFO: Deep-scrubbed 25146/25250 blocks (99.6%)
    INFO: Deep-scrubbed 25250/25250 blocks (100.0%)
    INFO: Set status of version temp-disk0-ofno6z to valid.
    INFO: Deep-scrub of version temp-disk0-ofno6z successful.

# Deep scrub with source image comparison:
(benji) [admin@kvm5b ~]# benji deep-scrub -s rbd:rbd_ssd/vm-448-disk-0_TEMP@b-2022-01-13T20:29:49 temp-disk0-ofno6z;
    INFO: Active transforms for storage radosgw-kvm7: zstd (zstd), aes_256_gcm (aes_256_gcm).
    INFO: Deep-scrubbed 127/25250 blocks (0.5%)
    INFO: Deep-scrubbed 254/25250 blocks (1.0%)
<snip>
    INFO: Deep-scrubbed 25146/25250 blocks (99.6%)
    INFO: Deep-scrubbed 25250/25250 blocks (100.0%)
    INFO: Set status of version temp-disk0-ofno6z to valid.
    INFO: Deep-scrub of version temp-disk0-ofno6z successful.

# Restore is successful:
(benji) [admin@kvm5b ~]# benji restore --sparse temp-disk0-ofno6z rbd:rbd_ssd/vm-448-disk-0_RESTORED;
    INFO: Active transforms for storage radosgw-kvm7: zstd (zstd), aes_256_gcm (aes_256_gcm).
    INFO: Restored 127/25250 blocks (0.5%)
    INFO: Restored 254/25250 blocks (1.0%)
<snip>
    INFO: Restored 25146/25250 blocks (99.6%)
    INFO: Restored 25250/25250 blocks (100.0%)
    INFO: Successfully restored version temp-disk0-ofno6z in 08m 16s with 203.5MiB/s.

# Image type and allocated space:
(benji) [admin@kvm5b ~]# rbd info rbd_ssd/vm-448-disk-0_RESTORED;
rbd image 'vm-448-disk-0_RESTORED':
        size 200 GiB in 51200 objects
        order 22 (4 MiB objects)
        snapshot_count: 0
        id: 389f5a99da5c70
        block_name_prefix: rbd_data.389f5a99da5c70
        format: 2
        features: layering, exclusive-lock, object-map, fast-diff
        op_features:
        flags:
        create_timestamp: Thu Jan 13 22:15:50 2022
        access_timestamp: Thu Jan 13 22:15:50 2022
        modify_timestamp: Thu Jan 13 22:22:57 2022
[admin@kvm5b ~]# rbd du rbd_ssd/vm-448-disk-0_RESTORED;
NAME                    PROVISIONED  USED
vm-448-disk-0_RESTORED      200 GiB  100 GiB

# Mount the images and compare the SHA sums of all 4 MiB blocks:
rbd map rbd_ssd/vm-448-disk-0_TEMP@b-2022-01-13T20:29:49
rbd map rbd_ssd/vm-448-disk-0_RESTORED
perl -ne 'use Digest::SHA qw(sha1_base64);BEGIN{$/=\4194304};print sha1_base64($_)."\n"' /dev/rbd36 > 1
perl -ne 'use Digest::SHA qw(sha1_base64);BEGIN{$/=\4194304};print sha1_base64($_)."\n"' /dev/rbd37 > 2
diff -uNr 1 2 | head -n 50;
<Many differences!>
--- 1   2022-01-13 22:49:35.132182778 +0200
+++ 2   2022-01-13 23:12:55.520197734 +0200
@@ -9,71 +9,71 @@
 dllv9Firo1+n8wailkk8BxiR9Oc
 3s6UW5eDENC4dvxu10/W+fTVpqQ
 K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
-t3YEVbQ9wr0eccw1VJ5iQM3/kBc
 K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
-IaueIqo7iwqoM9nxWIBalbruCks
 K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
-fjsPCW2Jqpxb8BuSh0FOsAkdltU
 K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
-IaueIqo7iwqoM9nxWIBalbruCks
 K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
-LLnJNQCYYe6fBpvu4dISugANa1Y
 K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
-IaueIqo7iwqoM9nxWIBalbruCks
 K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
-IaueIqo7iwqoM9nxWIBalbruCks
 K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
-IaueIqo7iwqoM9nxWIBalbruCks
+K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
+K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
+K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
+K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
+K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
+K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
+K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
+K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
 K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
 GZs85snEkB2BQdJOQV2uZTdup/I
-tQ0TIH0VF2P7CcQIbFONPoO4Rks
-IaueIqo7iwqoM9nxWIBalbruCks
 K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
-aVxBoPoz/N4wXVuL/gunagRONhc
 K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
-IaueIqo7iwqoM9nxWIBalbruCks
+K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
+K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
+K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
+K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
 K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
 JttkS6OyaOQJ1DcFxLLBRwQvQ9A
-S48n1cmHXq47ER9LgxjlOv1Laso
-O8jg6qKJkOABDMJIh3glHw9JfdE
 K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
-IaueIqo7iwqoM9nxWIBalbruCks
 K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
-IaueIqo7iwqoM9nxWIBalbruCks
 K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M

@bbs2web
Copy link
Author

bbs2web commented Jan 14, 2022

The sha string for a 4 MiB block of zeros is 'K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M', looks like to me that it should contain other data but gets restored with zeros.

Calculating sha1 sum of 4 MiB of zeros:

[admin@kvm5i ~]# dd if=/dev/zero of=/root/temp bs=4M count=1
1+0 records in
1+0 records out
4194304 bytes (4.2 MB, 4.0 MiB) copied, 0.00443111 s, 947 MB/s
[admin@kvm5i ~]# perl -ne 'use Digest::SHA qw(sha1_base64);BEGIN{$/=\4194304};print sha1_base64($_)."\n"' /root/temp
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
[admin@kvm5i ~]# rm -f /root/temp

@bbs2web
Copy link
Author

bbs2web commented Jan 20, 2022

Just a clarify the above:
We essentially have a certain RBD image which backups up fine, deep scrubs are fine, deep scrub comparing it to the backup snapshot completes without errors but the restored data does not match what was backed up.

Symptom is that running a file system integrity check on the backed up snapshot reports no errors whilst the restored image reports errors.

@dren-dk
Copy link

dren-dk commented Jan 21, 2022

Holy shit, that's bad! I can't really help, other than to note that benji seems a bit abandony, so I hope someone will pick up maintainership of it soon.

There are critical PRs waiting several months to be merged or at least commented on, so that doesn't bode well for the vitality of the project, either.

@elemental-lf elemental-lf self-assigned this Feb 24, 2022
@elemental-lf
Copy link
Owner

@bbs2web this actually sounds bad but I'm having a hard time making the time to work on Benji as already noted in another issue. But I'd still like to help as best as I can at the moment:

  • Have you tried switching from the rbdaio module to the old rbd module? I don't think that it is the problem here but it's worth a shot.
  • I've identified a loop hole in the deep scrubbing code that excludes sparse blocks from checking even if a source image is provided:

    benji/src/benji/benji.py

    Lines 223 to 226 in 23d1fc1

    if not block.uid:
    logger.debug('{} of block {} of version {} (UID {}) skipped (sparse).'.format(
    'Deep-scrub' if deep_scrub else 'Scrub', block.idx, version.uid, block.uid))
    continue

    This might explain why you're not seeing any errors while running a deep-scrub against the snapshot.
  • I think we can rule out that something is wrong in the reporting of sparse and changed blocks by Ceph as rbd diff is not involved in your example as I'm seeing the message No previous RBD snapshot found, reverting to initial backup..

@elemental-lf
Copy link
Owner

@bbs2web if you still have your test case could you give this PR/branch a try: #139? It includes code to also check sparse blocks against the source snapshot. The assumption would be that you're going to see some errors now when deep-scrubbing against the source snapshot.

@bbs2web
Copy link
Author

bbs2web commented Mar 14, 2022

Apologies about my absence. I applied the changes to /usr/local/benji/lib/python3.9/site-packages/benji/benji.py and /usr/local/benji/lib/python3.9/site-packages/benji/database.py. When I then ran a source image deep-scrub it did indeed report changes but it doesn't actually mark the image as invalid:

PS: I found a smaller image that demonstrates the problem (40 GiB).

(benji) [admin@kvm5i ~]# benji deep-scrub -s rbd:rbd_ssd/vm-401-disk-0@b-2022-03-14T20:33:49 temp-disk0-30bp3i;
    INFO: Active transforms for storage radosgw-kvm7: zstd (zstd), aes_256_gcm (aes_256_gcm).
    INFO: Deep-scrubbed 16/3162 blocks (0.5%)
    INFO: Deep-scrubbed 32/3162 blocks (1.0%)
    INFO: Deep-scrubbed 48/3162 blocks (1.5%)
<snip>
    INFO: Deep-scrubbed 3136/3162 blocks (99.2%)
    INFO: Deep-scrubbed 3152/3162 blocks (99.7%)
    INFO: Deep-scrubbed 3162/3162 blocks (100.0%)
   ERROR: Source data has changed for block 1 of version temp-disk0-30bp3i: It should be sparse but it is not.
   ERROR: Source data has changed for block 2 of version temp-disk0-30bp3i: It should be sparse but it is not.
   ERROR: Source data has changed for block 636 of version temp-disk0-30bp3i: It should be sparse but it is not.
   ERROR: Source data has changed for block 637 of version temp-disk0-30bp3i: It should be sparse but it is not.
   ERROR: Source data has changed for block 1141 of version temp-disk0-30bp3i: It should be sparse but it is not.
   ERROR: Source data has changed for block 1142 of version temp-disk0-30bp3i: It should be sparse but it is not.
   ERROR: Source data has changed for block 1175 of version temp-disk0-30bp3i: It should be sparse but it is not.
   ERROR: Source data has changed for block 10239 of version temp-disk0-30bp3i: It should be sparse but it is not.
   ERROR: Version temp-disk0-30bp3i had source mismatches.
   ERROR: Marked version temp-disk0-30bp3i as invalid because it has errors.
   ERROR: ScrubbingError: Deep-scrub of version temp-disk0-30bp3i failed.

Image is reported as being valid though:

(benji) [admin@kvm5i ~]# benji ls 'uid == "temp-disk0-30bp3i"';
+---------------------+-------------------+------------+-----------------------+---------+------------+--------+-----------+--------------+
|         date        | uid               | volume     | snapshot              |    size | block_size | status | protected | storage      |
+---------------------+-------------------+------------+-----------------------+---------+------------+--------+-----------+--------------+
| 2022-03-14T20:33:52 | temp-disk0-30bp3i | temp-disk0 | b-2022-03-14T20:33:49 | 40.0GiB |     4.0MiB | valid  |   False   | radosgw-kvm7 |
+---------------------+-------------------+------------+-----------------------+---------+------------+--------+-----------+--------------+

When I run the image comparison deep scrub again is reports the exact same information and the image remains marked as valid. Running a standard incremental backup predictably didn't change the outcome. Removing the b- snapshot and running the backup again performed a full backup which again yielded the same error when running a subsequent comparison of the newly re-created snapshot (created as part of last full backup) and the resulting backup image.

No change either when I retrieve the files in their entirety from the scrub-sparse-blocks branch.

The report of blocks 1 and 2 not matching now appears to match with restored images not booting (loading boot loader such as GRUB2).

@bbs2web
Copy link
Author

bbs2web commented Mar 14, 2022

Many thanks for the time that you are able to provide, Benji has been a great help on numerous occasions. No change either when I delete the snapshot and run a full backup using the rbd module, instead of rbdaio. Identical result to my last message.

@elemental-lf
Copy link
Owner

Thanks for your tests! I still have no clue what might be going wrong here so I concentrated on improving the scrub-sparse-blocks branch, so that we can better assess the extent of the problem and maybe we'll see some pattern in the affected versions or in the affected blocks.
It will properly report its progress now and will also mark a version as invalid if sparse blocks exist where they shouldn't. I've also enhanced the scripts used for the Kubernetes integration so that it is possible to run a deep-scrub with source compare subsequent to the backup.

@bbs2web
Copy link
Author

bbs2web commented Mar 15, 2022

Many thanks, it's now more verbose and flags the image as invalid:

(benji) [admin@kvm5i ~]# benji ls 'uid == "temp-disk0-5miip1"'
+---------------------+-------------------+------------+-----------------------+---------+------------+--------+-----------+--------------+
|         date        | uid               | volume     | snapshot              |    size | block_size | status | protected | storage      |
+---------------------+-------------------+------------+-----------------------+---------+------------+--------+-----------+--------------+
| 2022-03-15T20:33:55 | temp-disk0-5miip1 | temp-disk0 | b-2022-03-15T20:33:52 | 40.0GiB |     4.0MiB | valid  |   False   | radosgw-kvm7 |
+---------------------+-------------------+------------+-----------------------+---------+------------+--------+-----------+--------------+
(benji) [admin@kvm5i ~]# rbd ls rbd_ssd -l | grep 401
vm-401-disk-0                            40 GiB  rbd_ssd/base-441-disk-0@__base__    2        excl
vm-401-disk-0@b-2022-03-15T20:33:52      40 GiB  rbd_ssd/base-441-disk-0@__base__    2
(benji) [admin@kvm5i ~]# benji deep-scrub -s rbd:rbd_ssd/vm-401-disk-0@b-2022-03-15T20:33:52 temp-disk0-5miip1
    INFO: Active transforms for storage radosgw-kvm7: zstd (zstd), aes_256_gcm (aes_256_gcm).
    INFO: Deep-scrubbed 16/3180 blocks (0.5%)
    INFO: Deep-scrubbed 32/3180 blocks (1.0%)
<snip>
    INFO: Deep-scrubbed 3168/3180 blocks (99.6%)
    INFO: Deep-scrubbed 3180/3180 blocks (100.0%)
   ERROR: Source data has changed for block 1 of version temp-disk0-5miip1: It should be sparse but it is not.
   ERROR: Source data has changed for block 2 of version temp-disk0-5miip1: It should be sparse but it is not.
    INFO: Checked sparse 36/7060 blocks (0.5%)
    INFO: Checked sparse 72/7060 blocks (1.0%)
   ERROR: Source data has changed for block 636 of version temp-disk0-5miip1: It should be sparse but it is not.
   ERROR: Source data has changed for block 637 of version temp-disk0-5miip1: It should be sparse but it is not.
    INFO: Checked sparse 108/7060 blocks (1.5%)
    INFO: Checked sparse 144/7060 blocks (2.0%)
    INFO: Checked sparse 180/7060 blocks (2.5%)
    INFO: Checked sparse 216/7060 blocks (3.1%)
    INFO: Checked sparse 252/7060 blocks (3.6%)
    INFO: Checked sparse 288/7060 blocks (4.1%)
   ERROR: Source data has changed for block 1141 of version temp-disk0-5miip1: It should be sparse but it is not.
   ERROR: Source data has changed for block 1142 of version temp-disk0-5miip1: It should be sparse but it is not.
   ERROR: Source data has changed for block 1175 of version temp-disk0-5miip1: It should be sparse but it is not.
    INFO: Checked sparse 324/7060 blocks (4.6%)
    INFO: Checked sparse 360/7060 blocks (5.1%)
<snip>
    INFO: Checked sparse 7020/7060 blocks (99.4%)
    INFO: Checked sparse 7056/7060 blocks (99.9%)
   ERROR: Source data has changed for block 10239 of version temp-disk0-5miip1: It should be sparse but it is not.
    INFO: Checked sparse 7060/7060 blocks (100.0%)
   ERROR: Version temp-disk0-5miip1 had sparse blocks where the source does not.
   ERROR: Set status of version temp-disk0-5miip1 to invalid.
   ERROR: Marked version temp-disk0-5miip1 as invalid because it has errors.
   ERROR: ScrubbingError: Deep-scrub of version temp-disk0-5miip1 failed.
(benji) [admin@kvm5i ~]# benji ls 'uid == "temp-disk0-5miip1"'
+---------------------+-------------------+------------+-----------------------+---------+------------+---------+-----------+--------------+
|         date        | uid               | volume     | snapshot              |    size | block_size |  status | protected | storage      |
+---------------------+-------------------+------------+-----------------------+---------+------------+---------+-----------+--------------+
| 2022-03-15T20:33:55 | temp-disk0-5miip1 | temp-disk0 | b-2022-03-15T20:33:52 | 40.0GiB |     4.0MiB | invalid |   False   | radosgw-kvm7 |
+---------------------+-------------------+------------+-----------------------+---------+------------+---------+-----------+--------------+

After a full backup a subsequent comparison is however still problematic:

(benji) [admin@kvm5i ~]# benji::backup::ceph temp-disk0 rbd_ssd vm-401-disk-0 Troubleshooting;
Snapshot found for rbd_ssd/vm-401-disk-0 is b-2022-03-15T20:33:52.
Existing RBD snapshot not found in Benji, deleting it and reverting to initial backup.
Removing snap: 100% complete...done.
Performing initial backup of temp-disk0:rbd_ssd/vm-401-disk-0.
Creating snap: 100% complete...done.
    INFO: Starting backup of rbd:rbd_ssd/vm-401-disk-0@b-2022-03-15T22:42:42 to storage radosgw-kvm7, the newly created version is temp-disk0-onw0nk.
    INFO: Active transforms for storage radosgw-kvm7: zstd (zstd), aes_256_gcm (aes_256_gcm).
    INFO: Backed up 21/4162 blocks (0.5%)
    INFO: Backed up 42/4162 blocks (1.0%)
<snip>
    INFO: Backed up 4158/4162 blocks (99.9%)
    INFO: Backed up 4162/4162 blocks (100.0%)
    INFO: Set status of version temp-disk0-onw0nk to valid.
    INFO: Backed up metadata of version temp-disk0-onw0nk.
    INFO: Backup successful.
    INFO: Added label(s) to version temp-disk0-onw0nk: Troubleshooting=.
(benji) [admin@kvm5i ~]# benji ls 'uid == "temp-disk0-onw0nk"'
+---------------------+-------------------+------------+-----------------------+---------+------------+--------+-----------+--------------+
|         date        | uid               | volume     | snapshot              |    size | block_size | status | protected | storage      |
+---------------------+-------------------+------------+-----------------------+---------+------------+--------+-----------+--------------+
| 2022-03-15T22:42:44 | temp-disk0-onw0nk | temp-disk0 | b-2022-03-15T22:42:42 | 40.0GiB |     4.0MiB | valid  |   False   | radosgw-kvm7 |
+---------------------+-------------------+------------+-----------------------+---------+------------+--------+-----------+--------------+
(benji) [admin@kvm5i ~]# rbd ls rbd_ssd -l | grep 401
vm-401-disk-0                            40 GiB  rbd_ssd/base-441-disk-0@__base__    2        excl
vm-401-disk-0@b-2022-03-15T22:42:42      40 GiB  rbd_ssd/base-441-disk-0@__base__    2
(benji) [admin@kvm5i ~]# benji deep-scrub -s rbd:rbd_ssd/vm-401-disk-0@b-2022-03-15T22:42:42 temp-disk0-onw0nk
    INFO: Active transforms for storage radosgw-kvm7: zstd (zstd), aes_256_gcm (aes_256_gcm).
    INFO: Deep-scrubbed 16/3176 blocks (0.5%)
    INFO: Deep-scrubbed 32/3176 blocks (1.0%)
<snip>
    INFO: Deep-scrubbed 3168/3176 blocks (99.7%)
    INFO: Deep-scrubbed 3176/3176 blocks (100.0%)
   ERROR: Source data has changed for block 1 of version temp-disk0-onw0nk: It should be sparse but it is not.
   ERROR: Source data has changed for block 2 of version temp-disk0-onw0nk: It should be sparse but it is not.
    INFO: Checked sparse 36/7064 blocks (0.5%)
    INFO: Checked sparse 72/7064 blocks (1.0%)
   ERROR: Source data has changed for block 636 of version temp-disk0-onw0nk: It should be sparse but it is not.
   ERROR: Source data has changed for block 637 of version temp-disk0-onw0nk: It should be sparse but it is not.
    INFO: Checked sparse 108/7064 blocks (1.5%)
    INFO: Checked sparse 144/7064 blocks (2.0%)
    INFO: Checked sparse 180/7064 blocks (2.5%)
    INFO: Checked sparse 216/7064 blocks (3.1%)
    INFO: Checked sparse 252/7064 blocks (3.6%)
    INFO: Checked sparse 288/7064 blocks (4.1%)
   ERROR: Source data has changed for block 1141 of version temp-disk0-onw0nk: It should be sparse but it is not.
   ERROR: Source data has changed for block 1142 of version temp-disk0-onw0nk: It should be sparse but it is not.
   ERROR: Source data has changed for block 1175 of version temp-disk0-onw0nk: It should be sparse but it is not.
    INFO: Checked sparse 324/7064 blocks (4.6%)
    INFO: Checked sparse 360/7064 blocks (5.1%)
<snip>
    INFO: Checked sparse 7020/7064 blocks (99.4%)
    INFO: Checked sparse 7056/7064 blocks (99.9%)
   ERROR: Source data has changed for block 10239 of version temp-disk0-onw0nk: It should be sparse but it is not.
    INFO: Checked sparse 7064/7064 blocks (100.0%)
   ERROR: Version temp-disk0-onw0nk had sparse blocks where the source does not.
   ERROR: Set status of version temp-disk0-onw0nk to invalid.
   ERROR: Marked version temp-disk0-onw0nk as invalid because it has errors.
   ERROR: ScrubbingError: Deep-scrub of version temp-disk0-onw0nk failed.

@elemental-lf
Copy link
Owner

@bbs2web Okay, I've an hypothesis that I'd like you to check. Could you please remove the --rbd-hints parameter from the call to benji backup in backup_initial(!):

benji_args = [
'benji', '--machine-output', '--log-level', benji_log_level, 'backup', '--snapshot', snapshot,
'--rbd-hints', rbd_hints.name
]

That's the whole line 72. And then force an initial backup again and check if that version is correct? If you have your own scripts the change should be similar.

My theory is that Ceph might be erronously reporting these blocks as sparse when they are actually not...

@elemental-lf
Copy link
Owner

elemental-lf commented Mar 17, 2022

PS: If you're able to provide the output of rbd diff --whole-object --format=json {pool}/{image}@{snapshot} that would be helpful, too. That way we could check if Ceph is actually lying to us or if Benji has a bug in reading in the hints file.

EDIT: Only if the backup works without the hints of course.

@elemental-lf
Copy link
Owner

FYI: All the changes regarding benji deep-scrub --source are now included in the master branch.

@bbs2web
Copy link
Author

bbs2web commented Mar 20, 2022

Apologies for taking a little time to get these tests done. Running a deep scrub comparison against the snapshot it marked the image status as invalid. The next backup was then automatically a full backup but a subsequent comparison unfortunately still fails.

Herewith the modification to the ceph.py file:

[admin@kvm5b ~]# diff -uNr /usr/local/benji/lib/python3.9/site-packages/benji/helpers/ceph.py2 /usr/local/benji/lib/python3.9/site-packages/benji/helpers/ceph.py
--- /usr/local/benji/lib/python3.9/site-packages/benji/helpers/ceph.py2 2022-03-20 10:05:23.596598291 +0200
+++ /usr/local/benji/lib/python3.9/site-packages/benji/helpers/ceph.py  2022-03-20 10:05:34.472566156 +0200
@@ -70,7 +70,6 @@
         rbd_hints.flush()
         benji_args = [
             'benji', '--machine-output', '--log-level', benji_log_level, 'backup', '--snapshot', snapshot,
-            '--rbd-hints', rbd_hints.name
         ]
         if version_uid is not None:
             benji_args.extend(['--uid', version_uid])

Herewith the steps:

(benji) [admin@kvm5i ~]# benji ls 'volume == "temp-disk0"' | grep -e ------ -e date -e 2022-03-18
+---------------------+-------------------+------------+-----------------------+---------+------------+--------+-----------+--------------+
|         date        | uid               | volume     | snapshot              |    size | block_size | status | protected | storage      |
+---------------------+-------------------+------------+-----------------------+---------+------------+--------+-----------+--------------+
| 2022-03-18T20:25:30 | temp-disk0-u88xkm | temp-disk0 | b-2022-03-18T20:25:27 | 40.0GiB |     4.0MiB | valid  |   False   | radosgw-kvm7 |
+---------------------+-------------------+------------+-----------------------+---------+------------+--------+-----------+--------------+
(benji) [admin@kvm5i ~]# benji deep-scrub -s rbd:rbd_ssd/vm-401-disk-0@b-2022-03-18T20:25:27 temp-disk0-u88xkm
    INFO: Active transforms for storage radosgw-kvm7: zstd (zstd), aes_256_gcm (aes_256_gcm).
    INFO: Deep-scrubbed 17/3230 blocks (0.5%)
    INFO: Deep-scrubbed 34/3230 blocks (1.1%)
<snip>
    INFO: Deep-scrubbed 3213/3230 blocks (99.5%)
    INFO: Deep-scrubbed 3230/3230 blocks (100.0%)
   ERROR: Source data has changed for block 1 of version temp-disk0-u88xkm: It should be sparse but it is not.
   ERROR: Source data has changed for block 2 of version temp-disk0-u88xkm: It should be sparse but it is not.
    INFO: Checked sparse 36/7010 blocks (0.5%)
    INFO: Checked sparse 72/7010 blocks (1.0%)
   ERROR: Source data has changed for block 636 of version temp-disk0-u88xkm: It should be sparse but it is not.
   ERROR: Source data has changed for block 637 of version temp-disk0-u88xkm: It should be sparse but it is not.
    INFO: Checked sparse 108/7010 blocks (1.5%)
    INFO: Checked sparse 144/7010 blocks (2.1%)
    INFO: Checked sparse 180/7010 blocks (2.6%)
    INFO: Checked sparse 216/7010 blocks (3.1%)
    INFO: Checked sparse 252/7010 blocks (3.6%)
   ERROR: Source data has changed for block 1141 of version temp-disk0-u88xkm: It should be sparse but it is not.
   ERROR: Source data has changed for block 1142 of version temp-disk0-u88xkm: It should be sparse but it is not.
    INFO: Checked sparse 288/7010 blocks (4.1%)
    INFO: Checked sparse 324/7010 blocks (4.6%)
<snip>
    INFO: Checked sparse 6948/7010 blocks (99.1%)
    INFO: Checked sparse 6984/7010 blocks (99.6%)
   ERROR: Source data has changed for block 10239 of version temp-disk0-u88xkm: It should be sparse but it is not.
    INFO: Checked sparse 7010/7010 blocks (100.0%)
   ERROR: Version temp-disk0-u88xkm had sparse blocks where the source does not.
   ERROR: Set status of version temp-disk0-u88xkm to invalid.
   ERROR: Marked version temp-disk0-u88xkm as invalid because it has errors.
   ERROR: ScrubbingError: Deep-scrub of version temp-disk0-u88xkm failed.
(benji) [admin@kvm5i ~]# benji::backup::ceph temp-disk0 rbd_ssd vm-401-disk-0 Troubleshooting;
Snapshot found for rbd_ssd/vm-401-disk-0 is b-2022-03-18T20:25:27.
Existing RBD snapshot not found in Benji, deleting it and reverting to initial backup.
Removing snap: 100% complete...done.
Performing initial backup of temp-disk0:rbd_ssd/vm-401-disk-0.
Creating snap: 100% complete...done.
    INFO: Starting backup of rbd:rbd_ssd/vm-401-disk-0@b-2022-03-20T10:14:34 to storage radosgw-kvm7, the newly created version is temp-disk0-280432.
    INFO: Active transforms for storage radosgw-kvm7: zstd (zstd), aes_256_gcm (aes_256_gcm).
    INFO: Backed up 22/4212 blocks (0.5%)
    INFO: Backed up 44/4212 blocks (1.0%)
<snip>
    INFO: Backed up 4202/4212 blocks (99.8%)
    INFO: Backed up 4212/4212 blocks (100.0%)
    INFO: Set status of version temp-disk0-280432 to valid.
    INFO: Backed up metadata of version temp-disk0-280432.
    INFO: Backup successful.
    INFO: Added label(s) to version temp-disk0-280432: Troubleshooting=.
(benji) [admin@kvm5i ~]# benji ls 'volume == "temp-disk0"' | grep -e ------ -e date -e 2022-03-20
+---------------------+-------------------+------------+-----------------------+---------+------------+---------+-----------+--------------+
|         date        | uid               | volume     | snapshot              |    size | block_size |  status | protected | storage      |
+---------------------+-------------------+------------+-----------------------+---------+------------+---------+-----------+--------------+
| 2022-03-20T10:14:36 | temp-disk0-280432 | temp-disk0 | b-2022-03-20T10:14:34 | 40.0GiB |     4.0MiB |  valid  |   False   | radosgw-kvm7 |
+---------------------+-------------------+------------+-----------------------+---------+------------+---------+-----------+--------------+
(benji) [admin@kvm5i ~]# benji deep-scrub -s rbd:rbd_ssd/vm-401-disk-0@b-2022-03-20T10:14:34 temp-disk0-280432
    INFO: Active transforms for storage radosgw-kvm7: zstd (zstd), aes_256_gcm (aes_256_gcm).
    INFO: Deep-scrubbed 17/3227 blocks (0.5%)
    INFO: Deep-scrubbed 34/3227 blocks (1.1%)
<snip>
    INFO: Deep-scrubbed 3213/3227 blocks (99.6%)
    INFO: Deep-scrubbed 3227/3227 blocks (100.0%)
   ERROR: Source data has changed for block 1 of version temp-disk0-280432: It should be sparse but it is not.
   ERROR: Source data has changed for block 2 of version temp-disk0-280432: It should be sparse but it is not.
    INFO: Checked sparse 36/7013 blocks (0.5%)
    INFO: Checked sparse 72/7013 blocks (1.0%)
    INFO: Checked sparse 108/7013 blocks (1.5%)
   ERROR: Source data has changed for block 636 of version temp-disk0-280432: It should be sparse but it is not.
   ERROR: Source data has changed for block 637 of version temp-disk0-280432: It should be sparse but it is not.
    INFO: Checked sparse 144/7013 blocks (2.1%)
    INFO: Checked sparse 180/7013 blocks (2.6%)
    INFO: Checked sparse 216/7013 blocks (3.1%)
    INFO: Checked sparse 252/7013 blocks (3.6%)
    INFO: Checked sparse 288/7013 blocks (4.1%)
   ERROR: Source data has changed for block 1141 of version temp-disk0-280432: It should be sparse but it is not.
   ERROR: Source data has changed for block 1142 of version temp-disk0-280432: It should be sparse but it is not.
    INFO: Checked sparse 324/7013 blocks (4.6%)
    INFO: Checked sparse 360/7013 blocks (5.1%)
<snip>
    INFO: Checked sparse 6948/7013 blocks (99.1%)
    INFO: Checked sparse 6984/7013 blocks (99.6%)
   ERROR: Source data has changed for block 10239 of version temp-disk0-280432: It should be sparse but it is not.
    INFO: Checked sparse 7013/7013 blocks (100.0%)
   ERROR: Version temp-disk0-280432 had sparse blocks where the source does not.
   ERROR: Set status of version temp-disk0-280432 to invalid.
   ERROR: Marked version temp-disk0-280432 as invalid because it has errors.
   ERROR: ScrubbingError: Deep-scrub of version temp-disk0-280432 failed.
(benji) [admin@kvm5i ~]# benji ls 'volume == "temp-disk0"' | grep -e ------ -e date -e 2022-03-20
+---------------------+-------------------+------------+-----------------------+---------+------------+---------+-----------+--------------+
|         date        | uid               | volume     | snapshot              |    size | block_size |  status | protected | storage      |
+---------------------+-------------------+------------+-----------------------+---------+------------+---------+-----------+--------------+
| 2022-03-20T10:14:36 | temp-disk0-280432 | temp-disk0 | b-2022-03-20T10:14:34 | 40.0GiB |     4.0MiB | invalid |   False   | radosgw-kvm7 |
+---------------------+-------------------+------------+-----------------------+---------+------------+---------+-----------+--------------+

I've uploaded the output of a ceph whole-object diff:
ceph_diff_whole_object_json.txt
It appears to only reference the first sector (offset 0) with the 65th sector (offset 272629760 / 4194304 (4 MiB)).
Section of the log:

[admin@kvm5i ~]# rbd diff --whole-object --format=json rbd_ssd/vm-401-disk-0@b-2022-03-20T10:14:34
[{"offset":0,"length":4194304,"exists":"true"},{"offset":272629760,"length":4194304,"exists":"true"}

I presume Benji messages about block 1 and 2 being different means that block 0 matches but that blocks 1 and 2 are incorrectly reported by the diff as not existing?

@bbs2web
Copy link
Author

bbs2web commented Mar 20, 2022

Generating SHA1 sums for 4 MiB blocks of the source image does confirm that blocks 0-3 are non-zero (4 MiBs worth of zeros has a SHA1 sum of K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M):

[admin@kvm5i ~]# rbd map rbd_ssd/vm-401-disk-0@b-2022-03-20T10:14:34
/dev/rbd40
[admin@kvm5i ~]# perl -ne 'use Digest::SHA qw(sha1_base64);BEGIN{$/=\4194304};print sha1_base64($_)."\n"' /dev/rbd40 | head -n 66
+qnB9vP9IyPSqfR+ylHxyZHg8uY
ArU8NpEhyVq8OZocpMma5AECdQk
I7Wld9JtQMqPTU8WLGLmu06rJ4w
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
K8y9LzjxXBPrfVqJ/Z2F9ZXiO8M
v07DaRrFDoouMmx5ep9NFqd/pq8

@bbs2web
Copy link
Author

bbs2web commented Mar 20, 2022

The backed up image is a snapshot, of a clone.

(benji) [admin@kvm5b ~]# rbd ls rbd_ssd -l | grep -e NAME -e 401
NAME                                   SIZE      PARENT                            FMT  PROT  LOCK
vm-401-disk-0                            40 GiB  rbd_ssd/base-441-disk-0@__base__    2        excl
vm-401-disk-0@b-2022-03-20T11:12:39      40 GiB  rbd_ssd/base-441-disk-0@__base__    2

The clone's parent shows blocks 0,1 and 2 being defined before seeking to block 65:

(benji) [admin@kvm5b ~]# rbd diff --whole-object --format=json rbd_ssd/base-441-disk-0@__base__
[{"offset":0,"length":4194304,"exists":"true"},{"offset":4194304,"length":4194304,"exists":"true"},{"offset":8388608,"length":4194304,"exists":"true"},{"offset":272629760,

Perhaps there is an error in the Ceph tool logic when the clone doesn't define blocks as they fall through to the parent image?

(benji) [admin@kvm5b ~]# rbd diff --whole-object --format=json rbd_ssd/vm-401-disk-0
[{"offset":0,"length":4194304,"exists":"true"},{"offset":272629760,"length":4194304,"exists":"true"},

@bbs2web
Copy link
Author

bbs2web commented Mar 20, 2022

I've opened an issue in the Ceph tracker: https://tracker.ceph.com/issues/54970

Any chance you could perhaps work with 'rbd diff' output and then pad to the Ceph object size?

@elemental-lf elemental-lf changed the title Restored data does not always match original Restored data does not always match original (Ceph bug) Mar 20, 2022
@elemental-lf
Copy link
Owner

I really appreciate your thoroughness and the great quality of your analyses. Benji doesn't require --whole-object to work, you can simply leave it off. But together with the fast-diff feature calculating the difference between images is reduced to a mere metadata operation with --whole-object.
But even if --whole-object isn't used Benji will do the right thing. See this code:

benji/src/benji/benji.py

Lines 745 to 766 in 82a3e0c

def _blocks_from_hints(hints: Sequence[Tuple[int, int, bool]], block_size: int) -> Tuple[Set[int], Set[int]]:
sparse_blocks = set()
read_blocks = set()
for offset, length, exists in hints:
start_block = offset // block_size
end_block = (offset + length - 1) // block_size
if exists:
for i in range(start_block, end_block + 1):
read_blocks.add(i)
else:
if offset % block_size > 0:
# Start block is only partially sparse, make sure it is read
read_blocks.add(start_block)
if (offset + length) % block_size > 0:
# End block is only partially sparse, make sure it is read
read_blocks.add(end_block)
for i in range(start_block, end_block + 1):
sparse_blocks.add(i)
return sparse_blocks, read_blocks

I've changed the title of this issue slightly so that users browsing the issue won't be scared off from using Benji. I've added a warning message about rbd diff --whole-object to the documentation just now.

@elemental-lf elemental-lf removed the bug Something isn't working label Mar 20, 2022
@elemental-lf elemental-lf pinned this issue Mar 20, 2022
@elemental-lf
Copy link
Owner

I'm going to close this issue as this has been traced backed to Ceph. It will hopefully be fixed in Pacific (16.2.8 and later) and Quincy when it is released.

@bbs2web
Copy link
Author

bbs2web commented Jul 11, 2022

This was fixed in Ceph Pacific 16.2.9, many thanks again for this project!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants