Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata Caching in case of not existing file #831

Open
nguyenminhdungpg opened this issue Apr 1, 2024 · 12 comments
Open

Metadata Caching in case of not existing file #831

nguyenminhdungpg opened this issue Apr 1, 2024 · 12 comments
Labels
enhancement New feature or request

Comments

@nguyenminhdungpg
Copy link

nguyenminhdungpg commented Apr 1, 2024

If I use --cache option, will mountpoint-s3 not cache metadata of a requested file that is not existing?
In s3fs, there is option disable_noobj_cache that help me to achieve this.

Thank you very much

@dannycjones
Copy link
Contributor

dannycjones commented Apr 2, 2024

Hey @nguyenminhdungpg. Mountpoint will indeed cache the non-existence of a file. We've sometimes described this as 'negative caching'. This behavior was introduced in Mountpoint v1.5.0.

It sounds like you would prefer that Mountpoint would not cache these entries, and instead go to S3 each time if it has not seen that the file exists. Can you tell us a bit more about your use case? What problems does caching the negative entries introduce for you?

@dannycjones dannycjones added the enhancement New feature or request label Apr 2, 2024
@nguyenminhdungpg
Copy link
Author

nguyenminhdungpg commented Apr 4, 2024

Hi @dannycjones thank you for your information.

My usecase for using disable_noobj_cache option of s3fs at this time is very weird and it is not good.

There are 2 services run in parallel and there is a chance that one requests file in s3fs mounted folder before the other finished adding the file to S3 using AWS SDK for S3.
If I don't use disable_noobj_cache, then the non-existences of the file and maybe its parent folders are cached, requests to the file with high frequency can make it (and its parent folders) seem to be not-existed forever. This also breaks a logic in my service that copies files from or create new files in the seem-to-be-not-existed-forever folders in s3fs mounted folder.

On another hand, I am wondering without 'negative caching', can an attacker brute-force requests to non-existing files and increase my S3 cost?

@nguyenminhdungpg
Copy link
Author

Hi @dannycjones
Is there any news about how to flag off negative cache?

Currently I am setting TTL to 6 months but with negative cache is always enabled, some logics in my service are broken.

Thank you.

@monthonk
Copy link
Contributor

monthonk commented May 8, 2024

Hey, we still don't have any updates to share yet. For your use case, I wonder why you set the TTL to be really long? We usually recommend using long TTL for static bucket that its content doesn't change that often. Is it possible to reduce TTL and then you can change logic in your service to do retry?

Another option you have is calling readdir to discover the new files because we don't serve readdir results from the cache, but it's more expensive operation.

@nguyenminhdungpg
Copy link
Author

Hi @monthonk , thank you for your information.

I have a service to upload files to S3 using AWS SDK. File will be stored in folder tree with multi level style. File will not be modified or deleted. mount-s3 is used to serve files for my website and my website is very high traffics. My CDN cache is working not well, it still miss many requests and load on mount-s3 is very high at peak time. That's why I set mount-s3 TTL that long, I need files loaded as fast as possible.

mount-s3 negative cache does not only cache metadata of files, also folder. So when my service check if a file or folder is existing, folders' non-existence are cached by negative cache and break the logic of next requests (a lot of requests).
I can't use readdir because we store a very large number of files and it is very high traffics system, readdir can make performance bad.

I just need a way to flag off negative cache like disable_noobj_cache in s3fs.

As I see in many system, 404 is normally not cached or cached but with very small TTL. It is reasonable if we can flag off negative cache or negative cache has its own TTL.

@monthonk
Copy link
Contributor

monthonk commented May 8, 2024

Thanks for clarifying. I assume that your two services are running on different systems, so that writer couldn't access mountpoint directory, and that's why you use AWS SDK to upload the files. Is that correct?

@nguyenminhdungpg
Copy link
Author

@monthonk yes, that's correct. I use AWS SDK for upload service and I use s3fs/mount-s3 for file serving service (readonly).

@monthonk
Copy link
Contributor

monthonk commented May 8, 2024

Thanks for your information @nguyenminhdungpg, I think an option to turn off negative caching is a valid feature request. We will come back to you once we have any news on this.

@nguyenminhdungpg
Copy link
Author

@monthonk Thank you. I look forward to hearing from you.

@nguyenminhdungpg
Copy link
Author

Hi @monthonk , I'd like to ask a question.
If O_DIRECT flag is used in opening file, Mountpoint will check S3 to ensure the object exists and return the latest object content. Does it update the file's latest metadata into cache after this?
Thank you.

@monthonk
Copy link
Contributor

Hey @nguyenminhdungpg, yes, if you use O_DIRECT to open a file it will bypass the metadata cache and also update the cache after that.

@nguyenminhdungpg
Copy link
Author

nguyenminhdungpg commented May 23, 2024

Hi @monthonk , thank you. I've just done an workaround to bypass the negative caching using O_DIRECT and it fetches the file from S3, update the cache. The workaround help resolving my issue but it is great if I can flag of negative caching or set negative caching TTL using mount-s3 command options. (e.g set negative cache TTL to be 30s or 1m). Wish next version of mount-s3 come with this feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants