Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checksum update fails for replica where actual data file size differs from registered size #7724

Open
tsmeele opened this issue Apr 29, 2024 · 17 comments

Comments

@tsmeele
Copy link

tsmeele commented Apr 29, 2024

Bug Report

iRODS Version, OS and Version

iRODS 4.2.12 on Centos7

What did you try to do?

Update the ICAT checksum for a replica where meanwhile data file contents has decreased in size.

Expected behavior

Replica checksum attribute in ICAT is updated with calculated checksum of related data file.

Observed behavior (including steps to reproduce, if applicable)

When the data file on disk has a smaller file size than the file size registered with the related replica then iRODS fails to calculate the checksum and instead reports a FILE READ error.

irods@la0019 /var/lib/irods/ton /nluu9a/home/rods/ton $ ls -l testfile*
-rw-rw-r--. 1 irods irods 20 Apr 29 13:02 testfile
-rw-rw-r--. 1 irods irods  6 Apr 29 13:02 testfile.shorter
irods@la0019 /var/lib/irods/ton /nluu9a/home/rods/ton $ iput testfile
irods@la0019 /var/lib/irods/ton /nluu9a/home/rods/ton $ ichksum testfile
    testfile    sha2:gzwmfp2cdTpQTrNJFGB1nx//va0Xytl7a2cwilZ3dp8=
irods@la0019 /var/lib/irods/ton /nluu9a/home/rods/ton $ ils -L testfile
  rods              0 irodsResc;la0019;la0019_p02;la0019_02           20 2024-04-29.13:03 & testfile
    sha2:gzwmfp2cdTpQTrNJFGB1nx//va0Xytl7a2cwilZ3dp8=    generic    /mnt/irods02/Vault/home/rods/ton/testfile
irods@la0019 /var/lib/irods/ton /nluu9a/home/rods/ton $ cat testfile.shorter >/mnt/irods02/Vault/home/rods/ton/testfile
irods@la0019 /var/lib/irods/ton /nluu9a/home/rods/ton $ ils -L testfile
  rods              0 irodsResc;la0019;la0019_p02;la0019_02           20 2024-04-29.13:03 & testfile
    sha2:gzwmfp2cdTpQTrNJFGB1nx//va0Xytl7a2cwilZ3dp8=    generic    /mnt/irods02/Vault/home/rods/ton/testfile
irods@la0019 /var/lib/irods/ton /nluu9a/home/rods/ton $ ichksum testfile
    testfile    sha2:gzwmfp2cdTpQTrNJFGB1nx//va0Xytl7a2cwilZ3dp8=
irods@la0019 /var/lib/irods/ton /nluu9a/home/rods/ton $ ichksum -f testfile
remote addresses: 127.0.0.1 ERROR: chksumDataObjUtil: rcDataObjChksum error for /nluu9a/home/rods/ton/testfile status = -512000 UNIX_FILE_READ_ERR
remote addresses: 127.0.0.1 ERROR: chksumUtil: chksum error for /nluu9a/home/rods/ton/testfile, status = -512000 status = -512000 UNIX_FILE_READ_ERR
irods@la0019 /var/lib/irods/ton /nluu9a/home/rods/ton $ ienv
irods_version - 4.2.12

The error only happens if the filesize of the data file is shorter than the replica registered size.
Here is an example where the data file has a greater size than the replica registered size. Now checksum calculation behaves as expected.

irods@la0019 /var/lib/irods/ton /nluu9a/home/rods/ton $ iput testfile.shorter
irods@la0019 /var/lib/irods/ton /nluu9a/home/rods/ton $ ichksum testfile.shorter
    testfile.shorter    sha2:r+TyC1y40W9n6Mh0tUVKAvkdTx37rRoGB+GqdgREaw8=
irods@la0019 /var/lib/irods/ton /nluu9a/home/rods/ton $ ils -L testfile.shorter
  rods              0 irodsResc;la0019;la0019_p02;la0019_02            6 2024-04-29.13:10 & testfile.shorter
    sha2:r+TyC1y40W9n6Mh0tUVKAvkdTx37rRoGB+GqdgREaw8=    generic    /mnt/irods02/Vault/home/rods/ton/testfile.shorter
irods@la0019 /var/lib/irods/ton /nluu9a/home/rods/ton $ cat testfile >/mnt/irods02/Vault/home/rods/ton/testfile.shorter
irods@la0019 /var/lib/irods/ton /nluu9a/home/rods/ton $ ichksum -f testfile.shorter
    testfile.shorter    sha2:IkWMmNEhMhMk1B0W9ifxhS9i+1RUv0Qzjw2yuI3y+SQ=
irods@la0019 /var/lib/irods/ton /nluu9a/home/rods/ton $
@tsmeele tsmeele changed the title ichksum -f fails for replica where actual data file size is smaller than registered size Checksum update fails for replica where actual data file size is smaller than registered size Apr 29, 2024
@trel
Copy link
Member

trel commented Apr 29, 2024

We’ll need to check whether this happens in 4.3.1.

@tsmeele
Copy link
Author

tsmeele commented Apr 29, 2024

issue still exists for 4.3.1 (on a vanilla iRODS server, just created)

irods@ubuntu2004sudo:~/ton$ ls -l
total 8
-rw-rw-r-- 1 irods irods 13 Apr 29 15:33 testfile
-rw-rw-r-- 1 irods irods  5 Apr 29 15:34 testfile.shorter
irods@ubuntu2004sudo:~/ton$ iput testfile
irods@ubuntu2004sudo:~/ton$ ichksum testfile
    testfile    sha2:68chYPpvzc51Hl1ymMjfWD5hMe7BYfr1IJQVngXG01A=
irods@ubuntu2004sudo:~/ton$ ils -L testfile
  rods              0 demoResc           13 2024-04-29.15:34 & testfile
    sha2:68chYPpvzc51Hl1ymMjfWD5hMe7BYfr1IJQVngXG01A=    generic    /var/lib/irods/Vault/home/rods/testfile
irods@ubuntu2004sudo:~/ton$ cat testfile.shorter >/var/lib/irods/Vault/home/rods/testfile
irods@ubuntu2004sudo:~/ton$ ichksum testfile
    testfile    sha2:68chYPpvzc51Hl1ymMjfWD5hMe7BYfr1IJQVngXG01A=
irods@ubuntu2004sudo:~/ton$ ichksum -f testfile
remote addresses: 127.0.0.1 ERROR: chksumDataObjUtil: rcDataObjChksum error for /tempZone/home/rods/testfile status = -512000 UNIX_FILE_READ_ERR
remote addresses: 127.0.0.1 ERROR: chksumUtil: chksum error for /tempZone/home/rods/testfile, status = -512000 status = -512000 UNIX_FILE_READ_ERR
irods@ubuntu2004sudo:~/ton$ ienv
irods_version - 4.3.1

@trel
Copy link
Member

trel commented Apr 29, 2024

Well, there we go. Bug!

@trel trel added this to the 4.3.3 milestone Apr 29, 2024
@tsmeele
Copy link
Author

tsmeele commented Apr 29, 2024

Further analysis reveals that the recalculation of the checksum is performed only for the initial bytes of the data file upto the filesize as registered with the replica. Hence for (independently) enlarged data files, iRODS will calculate and register an incorrect checksum. See example below where the recalculated checksum should but does not differ from the previous checksum:

iput testfile file
irods@ubuntu2004sudo:~/ton$ ils -L
/tempZone/home/rods:
  rods              0 demoResc           13 2024-04-29.17:07 & file
        generic    /var/lib/irods/Vault/home/rods/file
irods@ubuntu2004sudo:~/ton$ ls -l /var/lib/irods/Vault/home/rods/file
-rw------- 1 irods irods 13 Apr 29 17:07 /var/lib/irods/Vault/home/rods/file
irods@ubuntu2004sudo:~/ton$ ichksum file
    file    sha2:68chYPpvzc51Hl1ymMjfWD5hMe7BYfr1IJQVngXG01A=
irods@ubuntu2004sudo:~/ton$ ils -L
/tempZone/home/rods:
  rods              0 demoResc           13 2024-04-29.17:07 & file
    sha2:68chYPpvzc51Hl1ymMjfWD5hMe7BYfr1IJQVngXG01A=    generic    /var/lib/irods/Vault/home/rods/file
irods@ubuntu2004sudo:~/ton$ cat testfile testfile >/var/lib/irods/Vault/home/rods/file
irods@ubuntu2004sudo:~/ton$ ls -l /var/lib/irods/Vault/home/rods/file
-rw------- 1 irods irods 26 Apr 29 17:09 /var/lib/irods/Vault/home/rods/file
irods@ubuntu2004sudo:~/ton$ ichksum -f file
    file    sha2:68chYPpvzc51Hl1ymMjfWD5hMe7BYfr1IJQVngXG01A=
irods@ubuntu2004sudo:~/ton$ ils -L
/tempZone/home/rods:
  rods              0 demoResc           13 2024-04-29.17:07 & file
    sha2:68chYPpvzc51Hl1ymMjfWD5hMe7BYfr1IJQVngXG01A=    generic    /var/lib/irods/Vault/home/rods/file
irods@ubuntu2004sudo:~/ton$ 

Now the good news is that a likely quick fix could be to first update the replica's filesize attribute using data file stat info, then proceed to calculate the checksum.

@trel
Copy link
Member

trel commented Apr 29, 2024

Agreed - this makes the error (and fix) more consistent. Thanks.

@tsmeele tsmeele changed the title Checksum update fails for replica where actual data file size is smaller than registered size Checksum update fails for replica where actual data file size differs from registered size Apr 29, 2024
@alanking
Copy link
Contributor

If we update the size of the size of the replica in the catalog, I'm wondering what it means about any sibling replicas which may still reflect the information in the catalog. Should the untouched sibling replicas be considered stale? Or should the replica found to be different from what is recorded be considered stale?

@trel
Copy link
Member

trel commented Apr 29, 2024

I think with an ichksum -f, perhaps we just stat() the replica first, then read that number of bytes? Ignore the value in the catalog?

Oh... wait... the catalog would still have the wrong size information - so it's not for ichksum to update that field in the catalog? Or is it?

What is the right way to tell the catalog to update its filesize information for a registered replica?

@alanking
Copy link
Contributor

Oh, I see... We are already updating the checksum (via -f?) and so updating the size while we're at it wouldn't be much different. We can just do whatever it normally does if there's a difference.

@trel
Copy link
Member

trel commented Apr 29, 2024

Right, that's the question. Would updating the size break any assumptions by other moving parts?

@tsmeele
Copy link
Author

tsmeele commented Apr 29, 2024

Now updating attributes might cause an update of the replica's modify_ts right? Would that have any impact on decisions regarding good/stale status of the replica and its siblings?

@trel
Copy link
Member

trel commented Apr 29, 2024

I don't think updating with ichksum -f should update the status of the replica... if it was stale, it should remain stale... and if it was good, it should remain good...

(is that true? ichksum -f of a stale replica of the correct size... only adds the checksum to the catalog... investigating...)
(answer: yes, see below)

Some testing with 4.3.1...

The file on disk for replica 1 is only 3 bytes:

$ ls -l /tmp/thechildvault/home/rods/another.txt
-rw------- 1 irods irods 3 Apr 29 14:14 /tmp/thechildvault/home/rods/another.txt

But the catalog shows 8 bytes:

irods$ ils -L another.txt
  rods              0 demoResc            5 2024-04-29.14:08 & another.txt
    sha2:I5YJnGwIT6S5vqyfDVLPO+nPjUcEDvEniD1TK1eQzXQ=    generic    /var/lib/irods/Vault/home/rods/another.txt
  rods              1 thechild            8 2024-04-29.14:07 X another.txt
    sha2:M6eyFQZfLuhjXvtyYgvCaaHvuIm6MCZWAzTac2Z0I3Q=    generic    /tmp/thechildvault/home/rods/another.txt

Checksum fails with UNIX_FILE_READ_ERR:

irods$ ichksum -f -n1 another.txt
remote addresses: 127.0.0.1 ERROR: chksumDataObjUtil: rcDataObjChksum error for /tempZone/home/rods/another.txt status = -512000 UNIX_FILE_READ_ERR
remote addresses: 127.0.0.1 ERROR: chksumUtil: chksum error for /tempZone/home/rods/another.txt, status = -512000 status = -512000 UNIX_FILE_READ_ERR

And the logs in the server...

 {"log_category":"legacy","log_level":"info","log_message":"[+]\t/irods_source/server/api/src/rsFileChksum.cpp:350:int file_checksum(RsComm *, const char *, const char *, const char *, const char *, rodsLong_t, char *) :  status [Unknown iRODS error]  errno [] -- message [file_checksum - fileRead failed for [/tmp/thechildvault/home/rods/another.txt].]\n\n","request_api_name":"DATA_OBJ_CHKSUM_AN","request_api_number":629,"request_api_version":"d","request_client_user":"rods","request_host":"127.0.0.1","request_proxy_user":"rods","request_release_version":"rods4.3.1","server_host":"xxx.xxx.xxx","server_pid":3831197,"server_timestamp":"2024-04-29T18:14:40.305Z","server_type":"agent","server_zone":"tempZone"}
 {"log_category":"legacy","log_level":"error","log_message":"file_checksum - The size of the replica recorded in the catalog is greater than the size in storage.","request_api_name":"DATA_OBJ_CHKSUM_AN","request_api_number":629,"request_api_version":"d","request_client_user":"rods","request_host":"127.0.0.1","request_proxy_user":"rods","request_release_version":"rods4.3.1","server_host":"xxx.xxx.xxx","server_pid":3831197,"server_timestamp":"2024-04-29T18:14:40.305Z","server_type":"agent","server_zone":"tempZone"}

Most interesting is this line...

"log_message":"file_checksum - The size of the replica recorded in the catalog is greater than the size in storage."

Then, updating the catalog to be 'smaller' than the file in storage...

irods$ iadmin modrepl logical_path /tempZone/home/rods/another.txt replica_number 1 DATA_SIZE 1

The catalog now shows a 1-byte file for replica 1:

irods$ ils -L another.txt
  rods              0 demoResc            5 2024-04-29.14:08 & another.txt
    sha2:I5YJnGwIT6S5vqyfDVLPO+nPjUcEDvEniD1TK1eQzXQ=    generic    /var/lib/irods/Vault/home/rods/another.txt
  rods              1 thechild            1 2024-04-29.14:07 X another.txt
    sha2:M6eyFQZfLuhjXvtyYgvCaaHvuIm6MCZWAzTac2Z0I3Q=    generic    /tmp/thechildvault/home/rods/another.txt

Running ichksum -f again, this time, succeeds...

irods$ ichksum -f -n1 another.txt
    another.txt    sha2:qqlAJmTxpB9A67xSyZk+tmrrNmYClY/fqig7ceZNsSM=

irods$ ils -L another.txt
  rods              0 demoResc            5 2024-04-29.14:08 & another.txt
    sha2:I5YJnGwIT6S5vqyfDVLPO+nPjUcEDvEniD1TK1eQzXQ=    generic    /var/lib/irods/Vault/home/rods/another.txt
  rods              1 thechild            1 2024-04-29.14:07 X another.txt
    sha2:qqlAJmTxpB9A67xSyZk+tmrrNmYClY/fqig7ceZNsSM=    generic    /tmp/thechildvault/home/rods/another.txt

So... maybe nothing needs to happen?

The server reports when it cannot read the entirety of the file (because the file on disk is larger than the size in the catalog).

@trel
Copy link
Member

trel commented Apr 29, 2024

What is the right way to tell the catalog to update its filesize information for a registered replica?

If something is touching the vault, and that's your design goal... then you have to manage the expectations that the filesizes need to be updated as well.

You can do that with iadmin modrepl OR with a trim and re-registration of that replica.


I'm now thinking ichksum should not be in the business of updating filesizes in the catalog.

@tsmeele
Copy link
Author

tsmeele commented Apr 29, 2024

According to the iRODS design [Rajasekar et al. 2006], the dirty bit (hence stale/good status) facilitates synchronization of replicas after changing one of the copies. This implies that status changes applied to the replicas of a data object should be linked to requests that write new data to a particular replica. This would suggest that changes to replica state stale/good should not be linked to checksum verify/update actions.

In the case where there is a discrepancy between ICAT held replica attributes and data file attributes, we typically expect a media failure or equivalent. Checksum verification allows us to find such discrepancies, where we regard the ICAT the trusted source.
As an exception, a deliberate user action to update the ICAT with a checksum calculated from a data file is an attempt to ensure that the replica attributes truthfully reflect the attributes of the data file.

In this light, one could argue the the modify_ts of a replica should reflect the last-modified timestamp of the datafile, and not change due to metadazta operations on checksum attributes (or file size attributes).

@tsmeele
Copy link
Author

tsmeele commented Apr 29, 2024

On the experiment changing file size in the ICAT: this would indeed to a modified checksum as less bytes are involved in the calculation. The resulting checksum however does not properly reflect the content of the entire data file, even though 'optically' it seems okay.

@trel
Copy link
Member

trel commented Apr 29, 2024

I agree that touching a checksum should not affect the modify_ts of the replica. And I think the current behavior matches that expectation.

The checksum matches the content contained within the filesize-in-the-catalog. I think this is also correct and good.

If the ichksum -f is not allowed to 'fix' the filesize (which I currently believe it should not), then everything seems to be behaving as expected.

Your original posting above has...

Expected behavior
Replica checksum attribute in ICAT is updated with calculated checksum of related data file.

a) Do you agree we cannot / should not do update the checksum without also updating the filesize in the catalog?

b) Do you agree that we should not touch the filesize in the catalog with this operation?

And if yes to both... then there is nothing to do here?

@tsmeele
Copy link
Author

tsmeele commented Apr 30, 2024

Indeed the current behavior seems to be consistent after all. iRODS manages only the data contained in a part of the data file bounded by the size attribute of the replica. It makes sense when one regards the data file as an arbitrary sized container. I tested other client tools such as iget and istream and they respect this boundary as well, only the 'managed' part of a data file is downloaded.

Hence the conclusion must be that this is not a bug, it is intended behavior. Agree with points a and b. It might be worthwhile to add a warning in ichksum documentation and the related microservice to clarify that the checksum value only represents the part of the data file indicated by the replica size attribute, not the entire datafile.

@trel
Copy link
Member

trel commented Apr 30, 2024

Excellent.

It might be worthwhile to add a warning in ichksum documentation and the related microservice to clarify that the checksum value only represents the part of the data file indicated by the replica size attribute, not the entire datafile.

We can do that. Marking this as a documentation issue.

@trel trel added documentation and removed bug labels Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants