Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Listing contents of large s3 folders is slow #140

Open
yoel-ross-zip opened this issue Mar 20, 2022 · 6 comments
Open

Listing contents of large s3 folders is slow #140

yoel-ross-zip opened this issue Mar 20, 2022 · 6 comments

Comments

@yoel-ross-zip
Copy link
Contributor

Hey,

Thanks for your work on this library, iv'e been using it for a while and its really nice.

Recently i ran into some issues with long load time for large s3 folders. I believe this is the result of repeated synchronous calls to the abstract lstat method. I have done some testing, and found that if making these calls with asyncio, using the s3fs._info method instead really speeds things up (like 20X faster on large folders).

I'm currently using a fork i made with these changes, and it works great. I opened a PR for you to consider: #139

I use this library quite a bit, and would be happy to put in the work to get this change merged.

Thanks again!

Joe

@danielfrg
Copy link
Owner

Fixed thanks to your PR :)
Thanks!

@aleny91
Copy link

aleny91 commented Jun 28, 2022

@ziprjoe @danielfrg First of all, many thanks for your precious work! 😄
I've just installed this new modified version, because I noticed the same problem working with large directories.
Sadly, I'm facing an error. It seems that the file .s3keep is present in the bucket only at the highest level, but not in the subdirectories where it is also searched. Any suggestions?

image

@yoel-ross-zip
Copy link
Contributor Author

Hey, should be a matter of catching the exception and ignoring it. In cases where there is no s3keep file, there isn't a way to show the last update time, so a dummy date will be displayed.
this PR should fix it: #143

@fakhavan
Copy link

@ziprjoe @danielfrg Firstly, I'd like to express my gratitude for your excellent work on this library. It has been incredibly useful for my use-case of connecting s3 with Jhub compared to the alternatives.

However, I've encountered an issue when using s3contents to connect to an S3 bucket with pre-existing directories. These directories aren't displayed in the UI unless I manually add a .s3keep file to each directory. Once I do this, the issue is resolved. I'm wondering if you are aware of the cause of this problem and if there's a way to use s3contents with a bucket that has pre-existing directories without having to manually add .s3keep files to each directory.

Thank you for your time and attention!

@danielfrg
Copy link
Owner

Hi @ziproje.

I think there are new ways to handle directories in S3 that do not require the placeholder files. I have not tested and to be honest I am not using this lib anymore.

I try to keep it updated but since I am not using it, it is behind on needed features and I dont expect I will be able to add new features in the near future. I basically just handle new releases from contributors at this point.

@fbaldo31
Copy link

@ziprjoe @danielfrg Firstly, I'd like to express my gratitude for your excellent work on this library. It has been incredibly useful for my use-case of connecting s3 with Jhub compared to the alternatives.

However, I've encountered an issue when using s3contents to connect to an S3 bucket with pre-existing directories. These directories aren't displayed in the UI unless I manually add a .s3keep file to each directory. Once I do this, the issue is resolved. I'm wondering if you are aware of the cause of this problem and if there's a way to use s3contents with a bucket that has pre-existing directories without having to manually add .s3keep files to each directory.

Thank you for your time and attention!

I handle that with a script called in postStart lifecycle hook

file=$HOME/.dir.txt
# Save s3 directory tree
aws s3 ls --recursive s3://<bucket> | cut -c32- | xargs -d '\n' -n 1 dirname | uniq > $HOME/.dir.txt
touch .s3keep

while IFS= read -r folder; do
    aws s3 cp .s3keep s3://<bucket>/$folder/.s3keep
done < "$file"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants