You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This feature would enable a user to input two different locations e.g. two different S3 buckets, an S3 bucket and a Swift folder, etc, and Motuz would output a list of all files in each location along with their sizes. It could optionally show the set intersection, union, and/or disjunction so that a user can figure out if they have any duplicate files (based on name and file size) or any files that are present in one location and NOT present in the other location. This would help users to be able to manage their data more effectively and increase the efficiency of their storage by enabling them to remove duplicate data, copy over only missing files, etc.
Sample implementation:
If I were to compare an S3 bucket to a posix file system manually I would do the following steps:
run "aws s3 ls --recursive --summarize s3://bucket > bucket.txt
run "ls -alR /path/to/folder > folder.txt"
canonicalize the paths in both bucket.txt and folder.txt to show path relative to root folder/bucket, file name, and size in bytes
sort both folders in order by file name and path
run "diff bucket.txt folder.txt" to compare and contrast what files are in both locations.
This feature is basically these 5 steps except between any two arbitrary folders/buckets/etc. in whatever storage systems Motuz supports. If this needs to be submitted as a job that then gets returned at a later time for the user to check the results, that would most likely be fine.
Nice to have:
compare the file hash if the storage system makes that readily available in the metadata
force the creation of the hash for each file in each location and include this in the results. This could then highlight files with the same name and a different hash or the same hash but a different name/path.
The text was updated successfully, but these errors were encountered:
This feature would enable a user to input two different locations e.g. two different S3 buckets, an S3 bucket and a Swift folder, etc, and Motuz would output a list of all files in each location along with their sizes. It could optionally show the set intersection, union, and/or disjunction so that a user can figure out if they have any duplicate files (based on name and file size) or any files that are present in one location and NOT present in the other location. This would help users to be able to manage their data more effectively and increase the efficiency of their storage by enabling them to remove duplicate data, copy over only missing files, etc.
Sample implementation:
If I were to compare an S3 bucket to a posix file system manually I would do the following steps:
This feature is basically these 5 steps except between any two arbitrary folders/buckets/etc. in whatever storage systems Motuz supports. If this needs to be submitted as a job that then gets returned at a later time for the user to check the results, that would most likely be fine.
Nice to have:
The text was updated successfully, but these errors were encountered: