Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.13.2 corrupts cache and cannot start (go.etcd.io/bbolt.(*freelist).read, panic: invalid freelist page: 0, page type is unknown<0) #4909

Open
davhdavh opened this issue May 10, 2024 · 4 comments

Comments

@davhdavh
Copy link
Contributor

After upgrade to 0.13.2 a week ago, we have twice seen that it crashes and totally corrupts the cache drive, which causes a failure to start after:

time="2024-05-10T08:43:48Z" level=warning msg="TLS is not enabled for tcp://0.0.0.0:12345. enabling mutual TLS authentication is highly recommended"
time="2024-05-10T08:43:48Z" level=info msg="auto snapshotter: using overlayfs"
time="2024-05-10T08:43:48Z" level=warning msg="NoProcessSandbox is enabled. Note that NoProcessSandbox allows build containers to kill (and potentially ptrace) an arbitrary process in the BuildKit host namespace. NoProcessSandbox should be enabled only when the BuildKit is running in a container as an unprivileged user."
time="2024-05-10T08:43:48Z" level=info msg="found worker \"yttpibzlvogkwvo25x638zc8c\", labels=map[org.mobyproject.buildkit.worker.executor:oci org.mobyproject.buildkit.worker.hostname:gha-buildkitd-0 org.mobyproject.buildkit.worker.network:host org.mobyproject.buildkit.worker.oci.process-mode:no-sandbox org.mobyproject.buildkit.worker.selinux.enabled:false org.mobyproject.buildkit.worker.snapshotter:overlayfs], platforms=[linux/amd64 linux/arm64 windows/amd64]"
time="2024-05-10T08:43:48Z" level=warning msg="platform linux/arm64 cannot pass the validation, kernel support for miscellaneous binary may have not enabled."
time="2024-05-10T08:43:48Z" level=info msg="found 1 workers, default=\"yttpibzlvogkwvo25x638zc8c\""
time="2024-05-10T08:43:48Z" level=warning msg="currently, only the default worker can be used."
panic: invalid freelist page: 0, page type is unknown<00>

goroutine 1 [running, locked to thread]:
go.etcd.io/bbolt.(*freelist).read(0x1ae27d2?, 0x7f21d9b10000)
	/src/vendor/go.etcd.io/bbolt/freelist.go:267 +0x20e
go.etcd.io/bbolt.(*DB).loadFreelist.func1()
	/src/vendor/go.etcd.io/bbolt/db.go:420 +0xb7
sync.(*Once).doSlow(0x410aa5?, 0xd53a00?)
	/usr/local/go/src/sync/once.go:74 +0xbf
sync.(*Once).Do(...)
	/usr/local/go/src/sync/once.go:65
go.etcd.io/bbolt.(*DB).loadFreelist(0xc000536480?)
	/src/vendor/go.etcd.io/bbolt/db.go:413 +0x45
go.etcd.io/bbolt.Open({0xc0009760c0, 0x29}, 0x9760c0?, 0x0)
	/src/vendor/go.etcd.io/bbolt/db.go:295 +0x430
github.com/moby/buildkit/solver/bboltcachestorage.NewStore({0xc0009760c0, 0x29})
	/src/solver/bboltcachestorage/storage.go:26 +0x28
main.newController(0xc000270c40?, 0xc000052700)
	/src/cmd/buildkitd/main.go:786 +0x3aa
main.main.func3(0xc0004b8580?)
	/src/cmd/buildkitd/main.go:327 +0xbc7
github.com/urfave/cli.HandleAction({0x177c8a0?, 0x1b7e760?}, 0xc000472380?)
	/src/vendor/github.com/urfave/cli/app.go:524 +0x50
github.com/urfave/cli.(*App).Run(0xc000472380, {0xc0000500a0, 0x1, 0x1})
	/src/vendor/github.com/urfave/cli/app.go:286 +0x766
main.main()
	/src/cmd/buildkitd/main.go:386 +0x1127
[rootlesskit:child ] error: command [buildkitd] exited: exit status 2
[rootlesskit:parent] error: child exited: exit status 2
@tonistiigi
Copy link
Member

I guess as a mitigation we should just log the error and clear the DB if this happens.

After upgrade to 0.13.2 a week ago

Don't see anything related to v0.13.x . Do you know what is the cause for crash or is this outside buildkit?

@davhdavh
Copy link
Contributor Author

Don't see anything related to v0.13.x . Do you know what is the cause for crash or is this outside buildkit?

Very inconsistent crash, in hundreds of builds it just happened twice. And i had to delete the cache pvc and create a new clean one

@AkihiroSuda AkihiroSuda changed the title 0.13.2 corrupts cache and cannot start 0.13.2 corrupts cache and cannot start (go.etcd.io/bbolt.(*freelist).read, panic: invalid freelist page: 0, page type is unknown<0) May 11, 2024
@tonistiigi tonistiigi added this to the v0.14.0 milestone May 13, 2024
@davhdavh
Copy link
Contributor Author

FYI: I am are up to 7 times I had to nuke the cache now.

@tonistiigi
Copy link
Member

Getting this error assumes that you got a panic (or sigkill) when you shut down the daemon because db is in corrupt state for next startup. If it was a panic then what was the cause (trace+error) for that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants