Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File listing Optimization Proposal #691

Open
sculptex opened this issue May 23, 2022 · 0 comments
Open

File listing Optimization Proposal #691

sculptex opened this issue May 23, 2022 · 0 comments

Comments

@sculptex
Copy link

sculptex commented May 23, 2022

Retrieving file listings, and in particular list-all will increasingly become more resource hungry as the number of files and directories in an allocation grow

Separate to having a folders-only option, here is an idea how file listings could be done more efficiently for large numbers of files, (compared with apparent current method that client requests lists from all blobbers and seeks sufficient consensus majority to consider a correct listing):-

  • instead of each blobber sending entire files listing, they create internal list with file paths and content hash only (consistent between blobbers)
  • This list is sorted
  • A hash of this list is generated (consistent between blobbers)
  • This list (referenced by it's hash) is saved (temporarily)
  • The hash is what is returned to client initially
  • The client only has to compare majority (consensus) of hashes to ensure file listing correctness.
  • Client only has to actually retrieve file listing from random one of matching blobbers
  • This can be done with pagination, solving consistency issue when time elapses between pages
  • In fact pagination requests can be split by Blobbers so blobber1 page 1, blobber2 page 2 etc.
  • Cached list can be retained at least for a short while for pagination requests but actually remains valid until write operation performed on allocation. So perhaps flagged as stale as soon as a CRUD operation performed then stale listing removed after say 1 minute.
  • This could form a secondary method of file listing at a certain threshold, perhaps for any more than say, 1,000 files or wherever pagination would be decided.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant