New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caching performance should be better #719
Comments
Based on looking at metrics in the debug logs it looks like the cache is "working", blocks are pulled and read from cache. The performance is not close to what I would expect for reading from disk. The same 10G file copied to my /mnt/raid0 disk comes off as such...
Next steps I'm going to try adding some flamegraphs to the tracing and see where this peaking. |
After sprinkling in a bunch of instrument attributes, spans, and a flamegraph layer; I can confirm that all of our time is spent in deserializing at mountpoint_s3::data_cache::disk_data_cache. |
Thanks for reporting this. In [1], we are introducing cache benchmarks. With this in place, we want to explore improvements by experimenting with changing the serialization library, for example. [1] #783 |
I should mention that the tracing changes I made are in my fork here https://github.com/raykrueger/mountpoint-s3/tree/flamegraphs I had no intention of sending a PR for the changes as they were pretty heavy handed. |
Mountpoint for Amazon S3 version
mount-s3 1.3.2
AWS Region
us-west-2
Describe the running environment
Ubuntu 22.04 and Amazon Linux 2023
Mountpoint options
What happened?
Using the --cache arg provides no performance benefit.
I have my cache dir set to /mnt/raid0, which is a 2 nvme raid0 mount. When I read a 10G file from my S3 mount it takes 9-10 seconds. When I read the file every time after the first, I expect that time to be the same speed as reading from that /mnt/raid0 location. Every subsequent read takes the same amount of time as the first. Also, it takes the same amount of time if I have no --cache at all.
I see the same behavior mounting directory (express) or regular S3 buckets.
Relevant log output
The 2s drop is margin for error variation.
The cache is populated.
This is the cache performance I expect.
The text was updated successfully, but these errors were encountered: