New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checksum update fails for replica where actual data file size differs from registered size #7724
Comments
We’ll need to check whether this happens in 4.3.1. |
issue still exists for 4.3.1 (on a vanilla iRODS server, just created)
|
Well, there we go. Bug! |
Further analysis reveals that the recalculation of the checksum is performed only for the initial bytes of the data file upto the filesize as registered with the replica. Hence for (independently) enlarged data files, iRODS will calculate and register an incorrect checksum. See example below where the recalculated checksum should but does not differ from the previous checksum:
Now the good news is that a likely quick fix could be to first update the replica's filesize attribute using data file stat info, then proceed to calculate the checksum. |
Agreed - this makes the error (and fix) more consistent. Thanks. |
If we update the size of the size of the replica in the catalog, I'm wondering what it means about any sibling replicas which may still reflect the information in the catalog. Should the untouched sibling replicas be considered stale? Or should the replica found to be different from what is recorded be considered stale? |
I think with an Oh... wait... the catalog would still have the wrong size information - so it's not for What is the right way to tell the catalog to update its filesize information for a registered replica? |
Oh, I see... We are already updating the checksum (via |
Right, that's the question. Would updating the size break any assumptions by other moving parts? |
Now updating attributes might cause an update of the replica's modify_ts right? Would that have any impact on decisions regarding good/stale status of the replica and its siblings? |
I don't think updating with (is that true? Some testing with 4.3.1... The file on disk for replica 1 is only 3 bytes:
But the catalog shows 8 bytes:
Checksum fails with
And the logs in the server...
Most interesting is this line...
Then, updating the catalog to be 'smaller' than the file in storage...
The catalog now shows a 1-byte file for replica 1:
Running
So... maybe nothing needs to happen? The server reports when it cannot read the entirety of the file (because the file on disk is larger than the size in the catalog). |
If something is touching the vault, and that's your design goal... then you have to manage the expectations that the filesizes need to be updated as well. You can do that with I'm now thinking |
According to the iRODS design [Rajasekar et al. 2006], the dirty bit (hence stale/good status) facilitates synchronization of replicas after changing one of the copies. This implies that status changes applied to the replicas of a data object should be linked to requests that write new data to a particular replica. This would suggest that changes to replica state stale/good should not be linked to checksum verify/update actions. In the case where there is a discrepancy between ICAT held replica attributes and data file attributes, we typically expect a media failure or equivalent. Checksum verification allows us to find such discrepancies, where we regard the ICAT the trusted source. In this light, one could argue the the modify_ts of a replica should reflect the last-modified timestamp of the datafile, and not change due to metadazta operations on checksum attributes (or file size attributes). |
On the experiment changing file size in the ICAT: this would indeed to a modified checksum as less bytes are involved in the calculation. The resulting checksum however does not properly reflect the content of the entire data file, even though 'optically' it seems okay. |
I agree that touching a checksum should not affect the modify_ts of the replica. And I think the current behavior matches that expectation. The checksum matches the content contained within the filesize-in-the-catalog. I think this is also correct and good. If the Your original posting above has...
a) Do you agree we cannot / should not do update the checksum without also updating the filesize in the catalog? b) Do you agree that we should not touch the filesize in the catalog with this operation? And if yes to both... then there is nothing to do here? |
Indeed the current behavior seems to be consistent after all. iRODS manages only the data contained in a part of the data file bounded by the size attribute of the replica. It makes sense when one regards the data file as an arbitrary sized container. I tested other client tools such as iget and istream and they respect this boundary as well, only the 'managed' part of a data file is downloaded. Hence the conclusion must be that this is not a bug, it is intended behavior. Agree with points a and b. It might be worthwhile to add a warning in ichksum documentation and the related microservice to clarify that the checksum value only represents the part of the data file indicated by the replica size attribute, not the entire datafile. |
Excellent.
We can do that. Marking this as a documentation issue. |
Bug Report
iRODS Version, OS and Version
iRODS 4.2.12 on Centos7
What did you try to do?
Update the ICAT checksum for a replica where meanwhile data file contents has decreased in size.
Expected behavior
Replica checksum attribute in ICAT is updated with calculated checksum of related data file.
Observed behavior (including steps to reproduce, if applicable)
When the data file on disk has a smaller file size than the file size registered with the related replica then iRODS fails to calculate the checksum and instead reports a FILE READ error.
The error only happens if the filesize of the data file is shorter than the replica registered size.
Here is an example where the data file has a greater size than the replica registered size. Now checksum calculation behaves as expected.
The text was updated successfully, but these errors were encountered: