Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: TestStaleHardlinks is flaky #478

Closed
navytux opened this issue Jun 12, 2023 · 11 comments
Closed

fs: TestStaleHardlinks is flaky #478

navytux opened this issue Jun 12, 2023 · 11 comments

Comments

@navytux
Copy link
Contributor

navytux commented Jun 12, 2023

From #351 (comment) :

I was running tests in a loop as:

$ while true; do echo; go test -p 1 -count 1 ./... || break ; done

and got breakage in fs in ~ 5 minutes:

...
warning: Inode.Path: n2 is orphaned, replacing segment with ".go-fuse.12305257969772279832/deleted"
20:03:22.593626 writer: Write/Writev failed, err: 2=no such file or directory. opcode: RELEASEDIR
warning: Inode.Path: n3 is orphaned, replacing segment with ".go-fuse.901660240436591636/deleted"
warning: Inode.Path: n854 is orphaned, replacing segment with ".go-fuse.4890713566819029247/deleted"
warning: Inode.Path: n951 is orphaned, replacing segment with ".go-fuse.15899998848308155049/deleted"
warning: Inode.Path: n2084 is orphaned, replacing segment with ".go-fuse.2297604607127977387/deleted"
warning: Inode.Path: n2814 is orphaned, replacing segment with ".go-fuse.14552686421981569371/deleted"
warning: Inode.Path: n2990 is orphaned, replacing segment with ".go-fuse.9123635009483468768/deleted"
warning: Inode.Path: n3170 is orphaned, replacing segment with ".go-fuse.15177763141428033337/deleted"
warning: Inode.Path: n3342 is orphaned, replacing segment with ".go-fuse.3646338540004335763/deleted"
warning: Inode.Path: n3423 is orphaned, replacing segment with ".go-fuse.1318443847442105450/deleted"
warning: Inode.Path: n3530 is orphaned, replacing segment with ".go-fuse.4342381435690588638/deleted"
warning: Inode.Path: n3617 is orphaned, replacing segment with ".go-fuse.1428700877519216651/deleted"
warning: Inode.Path: n3711 is orphaned, replacing segment with ".go-fuse.401526560035548991/deleted"
warning: Inode.Path: n5425 is orphaned, replacing segment with ".go-fuse.9338043697404393728/deleted"
warning: Inode.Path: n7037 is orphaned, replacing segment with ".go-fuse.729628660140245879/deleted"
warning: Inode.Path: n7235 is orphaned, replacing segment with ".go-fuse.1369596357678539859/deleted"
warning: Inode.Path: n7461 is orphaned, replacing segment with ".go-fuse.1369287383292163374/deleted"
warning: Inode.Path: n8305 is orphaned, replacing segment with ".go-fuse.17808099055425098859/deleted"
warning: Inode.Path: n9306 is orphaned, replacing segment with ".go-fuse.5174121311455743018/deleted"
warning: Inode.Path: n9811 is orphaned, replacing segment with ".go-fuse.2107295909796755029/deleted"
warning: Inode.Path: n9918 is orphaned, replacing segment with ".go-fuse.5183084698562304044/deleted"
warning: Inode.Path: n10824 is orphaned, replacing segment with ".go-fuse.486187419106585934/deleted"
warning: Inode.Path: n12942 is orphaned, replacing segment with ".go-fuse.8444501897156723168/deleted"
warning: Inode.Path: n13432 is orphaned, replacing segment with ".go-fuse.4542056890342277729/deleted"
warning: Inode.Path: n13637 is orphaned, replacing segment with ".go-fuse.6102584255343135637/deleted"
warning: Inode.Path: n13833 is orphaned, replacing segment with ".go-fuse.866370257234569186/deleted"
warning: Inode.Path: n13930 is orphaned, replacing segment with ".go-fuse.15087026409322293597/deleted"
warning: Inode.Path: n14713 is orphaned, replacing segment with ".go-fuse.11648377526940701427/deleted"
warning: Inode.Path: n15106 is orphaned, replacing segment with ".go-fuse.12246981705567265835/deleted"
warning: Inode.Path: n15524 is orphaned, replacing segment with ".go-fuse.16982074749685265062/deleted"
warning: Inode.Path: n18203 is orphaned, replacing segment with ".go-fuse.15654435576189543409/deleted"
warning: Inode.Path: n19304 is orphaned, replacing segment with ".go-fuse.4095005409658110961/deleted"
warning: Inode.Path: n19511 is orphaned, replacing segment with ".go-fuse.18158904314772731366/deleted"
warning: Inode.Path: n20540 is orphaned, replacing segment with ".go-fuse.13427310443358862917/deleted"
warning: Inode.Path: n21135 is orphaned, replacing segment with ".go-fuse.7950643906035889115/deleted"
warning: Inode.Path: n21330 is orphaned, replacing segment with ".go-fuse.8350283527915559158/deleted"
warning: Inode.Path: n21728 is orphaned, replacing segment with ".go-fuse.14787976173072795546/deleted"
warning: Inode.Path: n21923 is orphaned, replacing segment with ".go-fuse.7790348919694619749/deleted"
warning: Inode.Path: n22226 is orphaned, replacing segment with ".go-fuse.14996396507077797745/deleted"
warning: Inode.Path: n23026 is orphaned, replacing segment with ".go-fuse.16890645689759266835/deleted"
warning: Inode.Path: n23318 is orphaned, replacing segment with ".go-fuse.6196426916292442094/deleted"
warning: Inode.Path: n23601 is orphaned, replacing segment with ".go-fuse.17054134701676197557/deleted"
warning: Inode.Path: n24004 is orphaned, replacing segment with ".go-fuse.366239376632507138/deleted"
warning: Inode.Path: n24722 is orphaned, replacing segment with ".go-fuse.11359620588908067237/deleted"
warning: Inode.Path: n24812 is orphaned, replacing segment with ".go-fuse.4242556164732877867/deleted"
warning: Inode.Path: n25781 is orphaned, replacing segment with ".go-fuse.5436548359282688598/deleted"
warning: Inode.Path: n26079 is orphaned, replacing segment with ".go-fuse.5542355895719171192/deleted"
warning: Inode.Path: n26672 is orphaned, replacing segment with ".go-fuse.205714894418993594/deleted"
warning: Inode.Path: n26875 is orphaned, replacing segment with ".go-fuse.4185528009913706605/deleted"
warning: Inode.Path: n26974 is orphaned, replacing segment with ".go-fuse.17867490425958263130/deleted"
warning: Inode.Path: n27071 is orphaned, replacing segment with ".go-fuse.7224419671554691606/deleted"
warning: Inode.Path: n28066 is orphaned, replacing segment with ".go-fuse.11478882319185392135/deleted"
--- FAIL: TestStaleHardlinks (0.01s)
    simple_test.go:665: no such file or directory
FAIL
FAIL    github.com/hanwen/go-fuse/fs    2.717s

go-fuse: v2.3.0-5-g90b055a
os: Linux deca 6.1.0-9-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.27-1 (2023-05-08) x86_64 GNU/Linux
go: go version go1.20.5 linux/amd64

I also do see fs tests failing on github CI, e.g. here: https://github.com/hanwen/go-fuse/actions/runs/5210956777/jobs/9402681092#step:6:48 .

@navytux
Copy link
Contributor Author

navytux commented Jun 14, 2023

Reliable way to reproduce:

(neo) (z-dev) (g.env) kirr@deca:~/src/neo/src/github.com/hanwen/go-fuse/fs$ go test -run TestStaleHardlinks -count 1000 -failfast
--- FAIL: TestStaleHardlinks (0.00s)
    simple_test.go:675: no such file or directory
FAIL
exit status 1
FAIL    github.com/hanwen/go-fuse/fs    0.081s

(neo) (z-dev) (g.env) kirr@deca:~/src/neo/src/github.com/hanwen/go-fuse/fs$ go test -run TestStaleHardlinks -count 1000 -failfast
--- FAIL: TestStaleHardlinks (0.00s)
    simple_test.go:675: no such file or directory
FAIL
exit status 1
FAIL    github.com/hanwen/go-fuse/fs    0.918s

(neo) (z-dev) (g.env) kirr@deca:~/src/neo/src/github.com/hanwen/go-fuse/fs$ go test -run TestStaleHardlinks -count 1000 -failfast
--- FAIL: TestStaleHardlinks (0.01s)
    simple_test.go:675: no such file or directory
FAIL
exit status 1
FAIL    github.com/hanwen/go-fuse/fs    6.869s

go-fuse v2.3.0-11-g255ab74

@hanwen
Copy link
Owner

hanwen commented Jul 8, 2023

this test was introdcued in c318613 by @rfjakob . It's not representative of a correctly written FUSE filesystem (the "behind the back" deletions should be coupled with notifications back to the kernel).

the follow-on a90e1f4 says that the test is fixed now, but I don't understand why that would make it work.

@rfjakob
Copy link
Contributor

rfjakob commented Jul 8, 2023

What I meant with "behind the back" is that somebody else is modifying your backing file store (if you have one, like loopback or gocryptfs). That can happen when your files are on NFS or something and another host modifies it, or sync tools like Dropbox doing their thing.

But in any case, the test should not blow up, I'll try to take a look.

@rfjakob
Copy link
Contributor

rfjakob commented Jul 8, 2023

Oh. Excerpt below, full log here: https://gist.github.com/rfjakob/e812730df3918122c3393d010ed4db43

tx 206 gets heavily delayed after other LOOKUPs have succeeded.

21:26:56.191773 rx 206: LOOKUP n1 ["link11"] 7b
21:26:56.191800 rx 208: LOOKUP n1 ["link0"] 6b
21:26:56.191819 tx 208:     OK, {n2 g1 tE=0s tA=0s {M0100600 SZ=0 L=16 1026:1026 B0*4096 i0:39670 A 1688844416.180050 M 1688844416.180050 C 1688844416.190038}}
21:26:56.191913 rx 210: LOOKUP n1 ["link16"] 7b
21:26:56.191943 tx 210:     2=no such file or directory, {n0 g0 tE=0s tA=0s {M00 SZ=0 L=0 0:0 B0*0 i0:0 A 0.000000 M 0.000000 C 0.000000}}
21:26:56.192006 rx 212: LINK n1 {Oldnodeid: n2} ["link16"] 7b
21:26:56.192126 tx 212:     OK, {n2 g1 tE=0s tA=0s {M0100600 SZ=0 L=17 1026:1026 B0*4096 i0:39670 A 1688844416.180050 M 1688844416.180050 C 1688844416.191037}}
21:26:56.192245 rx 214: LOOKUP n1 ["link0"] 6b
21:26:56.192339 tx 214:     OK, {n2 g1 tE=0s tA=0s {M0100600 SZ=0 L=17 1026:1026 B0*4096 i0:39670 A 1688844416.180050 M 1688844416.180050 C 1688844416.191037}}
21:26:56.192436 rx 216: LOOKUP n1 ["link17"] 7b
21:26:56.192524 tx 216:     2=no such file or directory, {n0 g0 tE=0s tA=0s {M00 SZ=0 L=0 0:0 B0*0 i0:0 A 0.000000 M 0.000000 C 0.000000}}
21:26:56.192650 rx 218: LINK n1 {Oldnodeid: n2} ["link17"] 7b
21:26:56.192686 tx 218:     OK, {n2 g1 tE=0s tA=0s {M0100600 SZ=0 L=18 1026:1026 B0*4096 i0:39670 A 1688844416.180050 M 1688844416.180050 C 1688844416.191037}}
21:26:56.192719 rx 220: LOOKUP n1 ["link0"] 6b
21:26:56.192738 tx 220:     OK, {n2 g1 tE=0s tA=0s {M0100600 SZ=0 L=18 1026:1026 B0*4096 i0:39670 A 1688844416.180050 M 1688844416.180050 C 1688844416.191037}}
21:26:56.192778 rx 222: LOOKUP n1 ["link18"] 7b
21:26:56.192795 tx 222:     2=no such file or directory, {n0 g0 tE=0s tA=0s {M00 SZ=0 L=0 0:0 B0*0 i0:0 A 0.000000 M 0.000000 C 0.000000}}
21:26:56.192828 rx 224: LINK n1 {Oldnodeid: n2} ["link18"] 7b
21:26:56.192858 tx 224:     OK, {n2 g1 tE=0s tA=0s {M0100600 SZ=0 L=19 1026:1026 B0*4096 i0:39670 A 1688844416.180050 M 1688844416.180050 C 1688844416.192036}}
21:26:56.192933 rx 226: LOOKUP n1 ["link0"] 6b
21:26:56.192962 tx 226:     OK, {n2 g1 tE=0s tA=0s {M0100600 SZ=0 L=19 1026:1026 B0*4096 i0:39670 A 1688844416.180050 M 1688844416.180050 C 1688844416.192036}}
21:26:56.193021 rx 228: LOOKUP n1 ["link19"] 7b
21:26:56.193076 tx 228:     2=no such file or directory, {n0 g0 tE=0s tA=0s {M00 SZ=0 L=0 0:0 B0*0 i0:0 A 0.000000 M 0.000000 C 0.000000}}
21:26:56.193127 rx 230: LINK n1 {Oldnodeid: n2} ["link19"] 7b
21:26:56.193185 tx 230:     OK, {n2 g1 tE=0s tA=0s {M0100600 SZ=0 L=20 1026:1026 B0*4096 i0:39670 A 1688844416.180050 M 1688844416.180050 C 1688844416.192036}}
21:26:56.193326 rx 232: LOOKUP n1 ["link0"] 6b
21:26:56.193381 tx 232:     OK, {n2 g1 tE=0s tA=0s {M0100600 SZ=0 L=1 1026:1026 B0*4096 i0:39670 A 1688844416.180050 M 1688844416.180050 C 1688844416.192036}}
21:26:56.200238 tx 206:     OK, {n2 g1 tE=0s tA=0s {M0100600 SZ=0 L=16 1026:1026 B0*4096 i0:39670 A 1688844416.180050 M 1688844416.180050 C 1688844416.190038}}
21:26:56.200243 rx 234: OPEN n2 {0x8000} 
21:26:56.200292 rx 236: GETATTR n2 {Fh 0 } 
21:26:56.200293 tx 234:     2=no such file or directory, {Fh 0 }
21:26:56.200326 tx 236:     2=no such file or directory, {tA=0s {M00 SZ=0 L=0 0:0 B0*0 i0:0 A 0.000000 M 0.000000 C 0.000000}}

@hanwen
Copy link
Owner

hanwen commented Sep 9, 2023

@rfjakob - any update here?

@rfjakob
Copy link
Contributor

rfjakob commented Sep 9, 2023 via email

@rfjakob
Copy link
Contributor

rfjakob commented May 20, 2024

I can still repro, but not reliably, and the logs are noisy.

Looks like gvfsd-trash sets inotify_add_watch on the mountpoint and loops over all files. This is the noise in the logs.

$ ps -eLf | grep 63100
jakob      63095    2691   63100  0    5 22:22 ?        00:00:04 /usr/libexec/gvfsd-trash --spawner :1.16 /org/gtk/gvfs/exec_spaw/0

strace shows:

[pid 63116] inotify_add_watch(8, "/tmp/TestStaleHardlinks809765178/001/mnt", IN_MODIFY|IN_ATTRIB|IN_CLOSE_WRITE|IN_MOVED_FROM|IN_MOVED_TO|IN_CREATE|IN_DELETE|IN_DELETE_SELF|IN_MOVE_SELF|IN_UNMOUNT|IN_ONLYDIR) = 20256

@rfjakob
Copy link
Contributor

rfjakob commented May 20, 2024

pkill gvfsd-trash makes the problem go away.

The extra LOOKUPs from gvfsd-trash race against the test logic.

@rfjakob
Copy link
Contributor

rfjakob commented May 28, 2024

@navytux
Copy link
Contributor Author

navytux commented May 29, 2024

Thanks, @rfjakob.

@navytux
Copy link
Contributor Author

navytux commented May 29, 2024

P.S. https://review.gerrithub.io/c/hanwen/go-fuse/+/1195362 was submitted but did not appear in master on github. Probably there is again gerrithub -> github synchronization issue.

hanwen pushed a commit that referenced this issue May 29, 2024
It is sometimes helpful to know which process accesses
the filesystem.

Relates-to: #478
Change-Id: Ie435d093f03a32c02f02b45df880a207c5d7cad1
@hanwen hanwen closed this as completed in fc0fdbf May 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants