Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blobbers can become unresponsive when there are large number of files/directories. #117

Open
lpoli opened this issue Jun 20, 2021 · 7 comments
Assignees
Milestone

Comments

@lpoli
Copy link
Contributor

lpoli commented Jun 20, 2021

I came across list-all subcommand in zboxcli which uses getRemoteFilesAndDirs function: https://github.com/0chain/gosdk/blob/master/zboxcore/sdk/sync.go#L44

What it does is, it requests blobbers recursively to get list of files/directories and further traverses inside each child directories and so on till end. So if there are say 100 subdirectories then it will make atleast 100 such requests.

There is another option to request for ObjectTree from blobbers i.e. to call http request to blobbers as given in the doc: https://api.0chain.net/#402a1367-2f35-430b-9eaa-42917ead886b
So for instance if I send request for retrieving ObjectTree for root path then it will return the json response of whole file hierarchy in that respective allocation.

Above call is fine if there are smaller number of files but we need to consider that an allocation can contain thousands of such files. For about 5 directories and 5 files the response size was about 60KB so for large number of files it will be large sized response making blobber busy to serve request for certain amount of time as metadata can be of for example; 20MB which obviously stalls the blobber.

And above is just for single allocation scenario. Blobbers however are not confined to single allocation and there can be multitude of clients requests.

So the solution can be to provide paginated response or partial tree response(say we only provide few levels of tree depth in response).
There are two other options i.e. ObjectPath and ReferencePath requests. However both can grow larger in response size and have same issue as ObjectTree requests.

@Kishan-Dhakan
Copy link
Contributor

Kishan-Dhakan commented Jun 26, 2021

list-all and list, both make calls to the function NewListRequest in gosdk/zboxcore/zboxutil/http.go. In this, the an http GET request is made to the endpoint /v1/file/list/ whose response is the list (docs here).

Therefore, one way to go is, the list and list-all commands can provide offset and limit params (ex: show 100 items starting from the 51st item). Then, we extract and show the responses requested by client. This is not efficient as it would still mean fetching the entire list, but the current API doesn't have pagination implemented (as per the docs).

The other way, is to update the 0chain API to have a pagination as well, i.e., accept param(s) at the endpoint /v1/file/list/ which provide context for a paginated response.

Cc: @iamrz1

@guruhubb
Copy link
Member

guruhubb commented Jun 27, 2021

Just need to make the change at the blobber end

@lpoli
Copy link
Contributor Author

lpoli commented Jun 27, 2021

Hello Andrei,
I am working on making 0fs where user can mount their allocation to some directory and access files using system commands, same as local files.

It will be good user experience if they can "cd" into some directory and "ls" list all files in some directory quickly. Calling blobbers for each such requests will be slow and making frequent requests for each operation is also costly for blobber.

So to minimize this issue I need to have paginated view of ObjectTree. Currently request made to get ObjectTree for some path will return all the file tree from that respective path. What would be good was to have paginated response in both direction (paginated breadth and depth)
For example; User can have 1000 files in same directory so returning all the file info in response is infeasible. Similar issue is going till the end depth of the tree.

So I think we should paginate both side; breadth and depth.

0fs would be constructing full tree and it will update only when allocation root changes(which is hash of combined path and file changes). That way 0fs can also save already read file into disk temporarily providing easy access.

@Kishan-Dhakan
Copy link
Contributor

Just need to make the change at the blobber end

Right. I only mentioned the other way because this issue was opened in gosdk instead of blobber.

@moldis
Copy link
Contributor

moldis commented Oct 27, 2021

Need to add this ticket to clients @lpoli

@moldis moldis added the mainnet label Oct 27, 2021
@lpoli
Copy link
Contributor Author

lpoli commented Oct 27, 2021

What do you mean by clients?

@lpoli
Copy link
Contributor Author

lpoli commented Apr 15, 2022

With 64GB, blobber can handle even such large files.
There is GetRefs endpoint in blobber, which should be used and other method to get metadata should be replaced.
With GetRefs, consensus is also calculated among common fields.

@Kishan-Dhakan Kishan-Dhakan removed their assignment Apr 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants