Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for FOPEN_CACHE_DIR #454

Open
c-nuro opened this issue Mar 24, 2023 · 11 comments
Open

Support for FOPEN_CACHE_DIR #454

c-nuro opened this issue Mar 24, 2023 · 11 comments
Labels
Feature request for a feature

Comments

@c-nuro
Copy link

c-nuro commented Mar 24, 2023

libfuse/libfuse@1552b46

For having an option to cache direntry.

@hanwen
Copy link
Owner

hanwen commented Mar 25, 2023

yes, I should add that.

Note to self: need to change NodeOpendirer interface, so requires a v3 of the package.

@hanwen
Copy link
Owner

hanwen commented Mar 27, 2023

see https://review.gerrithub.io/c/hanwen/go-fuse/+/551748

Has anyone ever seen this working in practice?

@c-nuro
Copy link
Author

c-nuro commented Mar 27, 2023

I made a local patch and it works for me, but it requires both cache_dir and keep_cache flag

diff --git a/fs/bridge.go b/fs/bridge.go
index a292067..cb58182 100644
--- a/fs/bridge.go
+++ b/fs/bridge.go
@@ -918,6 +918,7 @@ func (b *rawBridge) OpenDir(cancel <-chan struct{}, input *fuse.OpenIn, out *fus
 	b.mu.Lock()
 	defer b.mu.Unlock()
 	out.Fh = uint64(b.registerFile(n, nil, 0))
+	out.OpenFlags |= fuse.FOPEN_CACHE_DIR | fuse.FOPEN_KEEP_CACHE
 	return fuse.OK
 }

@c-nuro
Copy link
Author

c-nuro commented Mar 29, 2023

FYI: One more issue I noticed in our testing: Default implementation of GetChildren doesn't guarantee the order due to map iteration https://go.dev/blog/maps . This can cause some race in kernel cache if multiple opendir is running at same time, because it uses the position + version to identify each dirent.

@hanwen
Copy link
Owner

hanwen commented Mar 29, 2023

FYI: One more issue I noticed in our testing: Default implementation of GetChildren doesn't guarantee the order due to map iteration https://go.dev/blog/maps . This can cause some race in kernel cache if multiple opendir is running at same time, because it uses the position + version to identify each dirent.

Can you tell me more about these races, and/or how the kernel uses the position?

@c-nuro
Copy link
Author

c-nuro commented Mar 29, 2023

Kernel will append the direntry to cache if the position of dirent matches the current size of cache, plus some mtime/version check of inode that invalidates the entire cache.
If multiple readdir in-progress return entries in different order, the same entry could be added multiple times and also missing some entries.

@hanwen
Copy link
Owner

hanwen commented Mar 29, 2023

sorry, it took me a while to understand: you are saying that concurrent calls in combination with FOPEN_CACHE_DIR cause entries to go missing? If yes, that makes perfect sense.

Would it be problematic if this was left to implementor (ie. you) to create a reproducible DirStream?

I think it would also be possible to make this reproducible by representing children inside Inode as

  // index into childrenList
  children map[string]int 
  childrenList []*Inode

but it would be quite a bit of work.

@c-nuro
Copy link
Author

c-nuro commented Mar 29, 2023

sorry, it took me a while to understand: you are saying that concurrent calls in combination with FOPEN_CACHE_DIR cause entries to go missing? If yes, that makes perfect sense.

Would it be problematic if this was left to implementor (ie. you) to create a reproducible DirStream?

I think it would also be possible to make this reproducible by representing children inside Inode as

  // index into childrenList
  children map[string]int 
  childrenList []*Inode

but it would be quite a bit of work.

That's how I workaround it eventually, I implemented my own Readdir handler by sorting entries by name.

Just let you know the default impl that iterates Children isn't reproducible and it may cause some confusion if other app wants to use it.

@rfjakob
Copy link
Contributor

rfjakob commented Apr 4, 2023

Blocked-by: #391

@hanwen
Copy link
Owner

hanwen commented Apr 7, 2023

Just let you know the default impl that iterates Children isn't reproducible and it may cause some confusion if other app wants to use it.

if you rebase onto master, this will be fixed.

I'm holding https://review.gerrithub.io/c/hanwen/go-fuse/+/551748 off for a few days, because I want to fix outstanding bugs before I create a v3 tag.

@sbunce
Copy link

sbunce commented Aug 16, 2023

I've been experimenting with this today because we have a use-case that's Readdir heavy (python scripts for ML things). One run goes from 94k Readdir calls down to 6.75k with FOPEN_CACHE_DIR. This results in a 4% speedup, which is significant.

I made a temporary fork and copy/pasted your change here to test it out. Let me know if any help is needed for testing to help land this change. 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature request for a feature
Projects
None yet
Development

No branches or pull requests

4 participants