New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
restic fails on repo mounted via CIFS/samba on Linux using go 1.14 build #2659
Comments
On Linux CIFS (SMB) seems to be incompatible with the async preemption implementation of Go 1.14. CIFS seems not to restart syscalls (open, read, chmod, readdir, ...) as expected by Go, which sets SA_RESTART for its signal handler to have syscalls restarted automatically. This leads to Go passing up lots of EINTR return codes to restic. See restic#2659 for a detailed explanation.
On Linux CIFS (SMB) seems to be incompatible with the async preemption implementation of Go 1.14. CIFS seems not to restart syscalls (open, read, chmod, readdir, ...) as expected by Go, which sets SA_RESTART for its signal handler to have syscalls restarted automatically. This leads to Go passing up lots of EINTR return codes to restic. See restic#2659 for a detailed explanation.
Hi, In that configuration the data and restic is on local disk and restic destination repository on Samba share disk. Something like "samba write" go version go1.15.2 linux/amd64 I did another backup from a music server (bluesound) where the data are remote and restic and repo are local Something like "Samba read". I got (without GODEBUG=asyncpreemptoff=1 ) few warning but the repo check was ok. With GODEBUG=asyncpreemptoff=1 I got no problems. restic 0.10.0 compiled with go1.15.2 on linux/arm64 |
Reported by @Mikescher in #2968
|
Analysis by @greatroar in #2968:
|
The consensus among maintainers (expressed at #3061) is to wait until Go 1.16, which will include a thorough fix in the stdlib. If that's right, may I suggest adding a summary of/link to the workaround at the top and pinning this issue? It comes up regularly and will continue to do so until ca. February. |
I'll pin this issue for now |
Updates restic#2659. This is one of the cases where the stdlib will not handle EINTR for us, even with Go 1.16. That xattr calls are directly affected can be seen in the report for issue restic#2968.
Updates restic#2659. This is a case where the stdlib will not handle EINTR for us, even with Go 1.16. That xattr calls are directly affected can be seen in the report for issue restic#2968.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
using |
The |
Go 1.16 has been released in the meantime. Could anyone try whether this finally solves the compatibility problems with CIFS on Linux? |
@csss1234 No. This issue only applies when the host running restic uses Linux. |
@underhillian Which Kernel version are you using? Did anyone else notice problems with restic + CIFS + go 1.16? We could add a check to prevent using a repository stored on a CIFS mount on Linux. That would basically enforce the remark that currently exists in the documentation. The big downside of this workaround is that we'd have no clue at all whether that option will still be necessary in the future or not. diff --git a/internal/backend/local/local.go b/internal/backend/local/local.go
index 0410e51b..d58d4880 100644
--- a/internal/backend/local/local.go
+++ b/internal/backend/local/local.go
@@ -5,6 +5,8 @@ import (
"io"
"os"
"path/filepath"
+ "runtime"
+ "strings"
"syscall"
"github.com/restic/restic/internal/errors"
@@ -31,6 +33,24 @@ const defaultLayout = "default"
// Open opens the local backend as specified by config.
func Open(ctx context.Context, cfg Config) (*Local, error) {
debug.Log("open local backend at %v (layout %q)", cfg.Path, cfg.Layout)
+
+ if runtime.GOOS == "linux" {
+ dbg, _ := os.LookupEnv("GODEBUG")
+ hasAsyncPreempt := !strings.Contains(dbg, "asyncpreemptoff=1")
+ if hasAsyncPreempt {
+ var stat syscall.Statfs_t
+ err := syscall.Statfs(cfg.Path, &stat)
+ if err != nil {
+ return nil, err
+ }
+ const CIFS_MAGIC_NUMBER = 0xff534d42
+ if stat.Type == CIFS_MAGIC_NUMBER {
+ return nil, errors.Fatal("Storing a repository on CIFS requires disabling" +
+ "asynchronous preemption by setting the environment variable GODEBUG to 'asyncpreemptoff=1'.")
+ }
+ }
+ }
+
l, err := backend.ParseLayout(ctx, &backend.LocalFilesystem{}, cfg.Layout, defaultLayout, cfg.Path)
if err != nil {
return nil, err
|
I use Arch Linux (rolling updates) and update regularly so my kernel is almost never more than a couple of weeks behind linux-stable. As best as I can tell from my trusty restic backups:wink:, at the time of my last test as reported above (Mar 07), I was running 5.11.2 (although it might have been a minor release or two later than that...can't be more precise at this point). |
I just upgraded a server at home from ubuntu LTS from 18.04 to 20.04 and now I can't run I also upgraded restic before trying so I don't know which upgrade caused the problems. This is all over the I don't have the repositoryu stored on the cifs mounted volume, I am only trying to create a backup of files that are mounted on a cifs file system.
If I run
My kernel is
|
Please specify |
0.12.1 via the self-update subcommand
|
Hmm, when I write a test specifically targeting the same file that fails with i/o error inside resic it does not cause an i/o error. I might have missed something. restic debug log:
simple test: package restic
import (
"os"
"syscall"
"testing"
"github.com/davecgh/go-spew/spew"
"github.com/restic/restic/internal/restic"
)
const (
fn = "/media/dubcube/homes/thomasf/Pictures/_VERY_RANDOM/1999/08/17/jw41.jpg"
)
func TestNodeFromFileInfo(t *testing.T) {
fi, err := os.Stat(fn)
if err != nil {
t.Fatal(err)
}
stat, ok := toStatT(fi.Sys())
t.Log(ok, stat)
if !ok {
t.Fail()
}
node, err := restic.NodeFromFileInfo(fn, fi)
if err != nil {
t.Fatal(err)
}
t.Log(spew.Sdump(node))
}
type statT syscall.Stat_t
func toStatT(i interface{}) (*statT, bool) {
s, ok := i.(*syscall.Stat_t)
if ok && s != nil {
return (*statT)(s), true
}
return nil, false
} test output:
|
Is always the same file affected or does the error affect different files each time? The testcase probably only works in the former case. Another major difference to the situation in restic, is that the overall load is completely different. restic usually keeps the system quite busy, whereas the testcase is a single command without putting any noteworthy stress on the system. As we know that the CIFS problems are triggered by the async. preemption feature of go, your testcase would have to run the problematic syscall at the time when the preemption signal arrives. For that the testcase would at least have to repeat the operation over and over again (probably on different files). Please test whether setting |
It looks like it's different files every run. It is a pretty slow home server (Intel(R) Core(TM)2 Duo CPU E6850 @ 3.00GHz) so it is probably more vulnerable to timing than the average modern machine. I also tried setting all the archiver concurrencies in the code to 1 and adding a few I will try to set up a smaller test repo where I don't have tens of thousands of files for every run and see if I can narrow anything down. |
I have tried a bunch of different cifs mount settings now without any luck. An flag that enabled you yo just skip extended attributes would also work (at least for me). It is only the I just did this to see what would happen and now I can at least run my backups without many thousands of errors func (node *Node) fillExtendedAttributes(path string) error {
return nil So this is probably not the same fault that this issue is really about but it is somewhat related. The syscalls does not as far as I can understand time out at all, they actually fail with io error which is what strace also says. Update: After changing to not doing the |
Did you try the
The symptoms sound like a serious bug in either the CIFS server or client. Or maybe it is also a network problem. Although the CPU of the server isn't the fastest, it's also not the slowest one either. |
I missed that one and it also seems to work , have been running through a few tens of thousands files already and should definitely have failed by now.
I ran a few network load tests and even upgraded the switch firmware so it's probably not that. It's probably some interaction between the samba/kernel version of my qnap NAS and the CIFS client in the latest Ubuntu LTS kernel. In any case |
After many months of using |
same issue here. trying to backup 1.5T data and process interrupted all the time with different files. |
@imkyaky As mentioned in one of the earlier posts here, go 1.16 includes some additional fixes to handle the interruptions. Can you try whether restic 0.12.0 or 0.12.1 build using go 1.16 (or even better 1.17) resolves the problem? Otherwise, try also setting the environment variable |
Relevant: rclone/rclone#2042. If rclone gets SMB support, this issue can be worked around, and it will work on all platforms. I'm not volunteering, but if anyone needs a summer project... |
Updates restic#2659. This is a case where the stdlib will not handle EINTR for us, even with Go 1.16. That xattr calls are directly affected can be seen in the report for issue restic#2968.
Is this issue still relevant with recent restic versions, that is restic 0.14.0? |
I created a small (~25GB) repo as a test case and went through a series of backup, forget, and prune operations similar to those where I saw errors originally. Everything worked perfectly. I performed the testing using
under linux 6.06. I did not set GODEBUG, This wasn't an exhaustive test by any means, but it was extensive enough that (based on my previous experience) I would have expected to see multiple errors if the issue was still present. So the issue has very likely been resolved. |
The manual doesn't mention that this is also an issues when reading data from a CIFS share. |
does this mean it's fixed in newer kernels? would make sense to document what kernel version fixed it |
We unfortunately have no real idea which versions are affected, I'm not actually sure how exactly the problem can be reproduced efficiently. Recent Linux (kernel) versions seem to be unproblematic. |
Just wanted to note that rclone has merged support for SMB. |
This issue is a summary of https://forum.restic.net/t/prune-fails-on-cifs-repo-using-go-1-14-build/2579 , intended as a reference for the underlying problem.
tl;dr Workaround: Setting the environment variable
GODEBUG
toasyncpreemptoff=1
restores the pre Go 1.14 behavior and fixes the problem.Output of
restic version
restic 0.9.6 (v0.9.6-137-gc542a509) compiled with go1.14 on linux/amd64
Linux Kernel version: 5.5.9
How did you run restic exactly?
restic prune -r /path/to/repository/on/CIFS/share
Relevant log excerpt:
Prune failed in the end
Further relevant log excerpts:
What backend/server/service did you use to store the repository?
Local backend stored on a CIFS share
Expected behavior
No warnings, prune should complete.
Actual behavior
Prune failed.
Steps to reproduce the behavior
Build restic using Go 1.14 and store the backup repository on a CIFS share.
Do you have any idea what may have caused this?
This issue is a side effect of asynchronous preemptions in go 1.14. The [https://golang.org/doc/go1.14#runtime](release notes) state the following:
Go configures signal handlers to restart syscalls if possible. The standard library also retries syscalls when necessary. That is there should only be issues when directly calling low-level syscalls and in that case one should just implement things properly. However, restic just uses go standard library functions that should already handle EINTR if necessary.
The first prune error message points to an
os.Open
call (viafs.Open
) in the Load function of the local backend. So it looks like a Go standard library call fails. However, the manpage for signal (man 7 signal) states that theopen
syscall, that is called underneath, is always restarted when usingSA_RESTART
as is done by Go. So this seems to be a bug in the Linux kernel. Adding a loop around the call tofs.Open
to repeat it as long as EINTR is returned, fixes that one call. Fixing all problematic calls would end up adding lots of ugly loops and playing whack-a-mole.The manpages of lstat, readdir and chmod don't even list EINTR as a possible errno.
Do you have an idea how to solve the issue?
Setting the environment variable
GODEBUG
toasyncpreemptoff=1
restores the pre Go 1.14 behavior and fixes the problem.Go relies on the assumption that the kernel properly restarts syscalls when told to do so. As that latter is obviously not the case, the proper fix would be to submit a bug report to the linux kernel.
A short-term solution would be to add a note to the restic documentation that mentions the compatibility problem with CIFS mounts.
The text was updated successfully, but these errors were encountered: